[SLP] Cluster ordering for loads
authorDavid Green <david.green@arm.com>
Sat, 7 May 2022 13:38:11 +0000 (14:38 +0100)
committerDavid Green <david.green@arm.com>
Sat, 7 May 2022 13:38:11 +0000 (14:38 +0100)
commit802e15c576997f76bffb4c08b6f81d6c79c320e0
treece2c0697c41ddfeb4b251fadb02c1d85dac1f580
parent2cd080c884a3dd1fc673f02afd48bfe9ba01ce89
[SLP] Cluster ordering for loads

Given a load without a better order, this patch partially sorts the
elements to form clusters of adjacent elements in memory. These clusters
can potentially be loaded in fewer loads, meaning less overall shuffling
(for example loading v4i8 clusters of a v16i8 as a single f32 loads, as
opposed to multiple independent bytes loads and inserts).

Differential Revision: https://reviews.llvm.org/D122145
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll
llvm/test/Transforms/SLPVectorizer/X86/split-load8_2-unord.ll