[llvm-dev] MatchLoadCombine(): handling for vectorized loop.

Mon Dec 3 08:20:44 PST 2018

Hi,

I have noticed some loops that build a wider element by loading small 
elements, zero-extending them, shifting them (with different amounts) to 
then 'or' them all together. They are either equivalent of a wider load, 
or to that of a byte-swapped one.

DAGCombiner::MatchLoadCombine() will combine this to a single wide load, 
but only in the scalar cases of i16, i32 and i64. The result is that 
these loops (I have seen a dozen or so on SPEC) get vectorized with a 
lot of ugly code.

I have begun to experiment with handling the vectorized loop also, and 
would like to know if people think this would be a good idea? Also, am I 
right to assume that it probably should be run before type legalization?

/Jonas