[PATCH] optimize merging of scalar loads for 32-byte vectors [X86, AVX] (PR21710)
Sanjay Patel
spatel at rotateright.com
Fri Dec 5 13:33:30 PST 2014
>>! In D6536#7, @RKSimon wrote:
> Tested with some basic internal tests - no problems encountered (and folding seems to work well too).
>
> I also tested integer (i64, i32, i16, i8) sequential loads and that optimized as expected too - not sure if its worth adding tests for these or not?
Thanks - committed at r223518.
I think we're ok without testing every type, but this does raise a potential corner case for an AVX-only machine: is it perf worse to use a 32-byte FP store when dealing with ints? Ie, is there a domain-crossing penalty for a store of the 'wrong' type? Would we ever have a 32-byte vector of ints incoming to this code on an AVX-only machine?
REPOSITORY
rL LLVM
http://reviews.llvm.org/D6536
More information about the llvm-commits
mailing list