[llvm-dev] Question about generated code for x86 vpgather* intrinsics

Mon Dec 18 16:52:39 PST 2017

Hi,

I've been looking into some basic vectorized code using
 _mm256_i32gather_epi32 for vpgatherdd. In a basic function I've been
testing, I'm a little confused that it zeroes out the result before the
gather (example: https://godbolt.org/g/zQzn56). Shouldn't this be
unnecessary when the mask is not specified by the intrinsic (and therefor
set for every element), in which case the resulting ymm register will be
fully loaded? The IR seems to specify a pre-zeroed result if I'm
understanding things correctly:

%8 = tail call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32>
zeroinitializer, i8* %1, <8 x i32> %7, <8 x i32> <i32 -1, i32 -1, i32 -1,
i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i8 1)

Am I correct in thinking this initialization is unnecessary here given than
the mask is all -1? I'm not really concerned with the cost of the
initialization as much as I am concerned that I don't fully understand the
semantics of the instruction/intrinsic :)

Thanks,
-Jackson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171218/e1e2ee30/attachment.html>