[PATCH] optimize merging of scalar loads for 32-byte vectors [X86, AVX] (PR21710)

Fri Dec 5 13:33:30 PST 2014

>>! In D6536#7, @RKSimon wrote:
> Tested with some basic internal tests - no problems encountered (and folding seems to work well too).
> 
> I also tested integer (i64, i32, i16, i8) sequential loads and that optimized as expected too - not sure if its worth adding tests for these or not?

Thanks - committed at r223518.

I think we're ok without testing every type, but this does raise a potential corner case for an AVX-only machine: is it perf worse to use a 32-byte FP store when dealing with ints? Ie, is there a domain-crossing penalty for a store of the 'wrong' type? Would we ever have a 32-byte vector of ints incoming to this code on an AVX-only machine?

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D6536