[llvm-dev] MatchLoadCombine(): handling for vectorized loop.

Mon Dec 3 15:37:29 PST 2018

On 12/3/2018 8:20 AM, Jonas Paulsson wrote:
> Hi,
>
> I have noticed some loops that build a wider element by loading small 
> elements, zero-extending them, shifting them (with different amounts) 
> to then 'or' them all together. They are either equivalent of a wider 
> load, or to that of a byte-swapped one.
>
> DAGCombiner::MatchLoadCombine() will combine this to a single wide 
> load, but only in the scalar cases of i16, i32 and i64. The result is 
> that these loops (I have seen a dozen or so on SPEC) get vectorized 
> with a lot of ugly code.
>
> I have begun to experiment with handling the vectorized loop also, and 
> would like to know if people think this would be a good idea? Also, am 
> I right to assume that it probably should be run before type 
> legalization?
>
You mean, trying to merge some combination of vector loads and shuffles 
into a single vector load in DAGCombine?  That seems sort of late, given 
the cost modeling involved in vectorization.

See also 
http://lists.llvm.org/pipermail/llvm-dev/2018-February/121000.html ?

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project