[llvm-dev] MatchLoadCombine(): handling for vectorized loop.
Jonas Paulsson via llvm-dev
llvm-dev at lists.llvm.org
Mon Dec 3 08:20:44 PST 2018
I have noticed some loops that build a wider element by loading small
elements, zero-extending them, shifting them (with different amounts) to
then 'or' them all together. They are either equivalent of a wider load,
or to that of a byte-swapped one.
DAGCombiner::MatchLoadCombine() will combine this to a single wide load,
but only in the scalar cases of i16, i32 and i64. The result is that
these loops (I have seen a dozen or so on SPEC) get vectorized with a
lot of ugly code.
I have begun to experiment with handling the vectorized loop also, and
would like to know if people think this would be a good idea? Also, am I
right to assume that it probably should be run before type legalization?
More information about the llvm-dev