[PATCH] D38316: [InstCombine] replace bitcast to scalar + insertelement with widening shuffle + vector bitcast

Wed Sep 27 11:12:20 PDT 2017

spatel added a comment.

In https://reviews.llvm.org/D38316#882376, @efriedma wrote:

> ARM/AArch64 are very similar in this respect, since there are multiple vector register sizes.  You'll see a similar result for your examples on aarch64.  (On 32-bit ARM, we manage to optimize away the extra copy after isel.)  I'm not quite sure how much of this logic it makes sense to put into instcombine, given most of the benefit here has to do with the way CPUs split integer and vector registers, but this is probably okay for other targets.

Yes, I'm biased in my split register files view. If we disregard that, then we could just look at this as a canonicalization rule for equivalent IR constructs. But we need to be sure that targets can accept the relatively simple shuffle mask either way.

> Does it make sense to do this transform even if the operand of the insertelement isn't undef?  I guess you'd need a second shuffle in that case?

Right - I thought about that case, but I couldn't justify it if we need another IR instruction.

> Took me a minute to reason it out, but I think this is right on big-endian targets: semantically, the bitcast in both cases is essentially equivalent.

Took me more than a minute, and I still wasn't entirely sure. :) 
Thanks for thinking about that - I should've mentioned it in the description.

https://reviews.llvm.org/D38316