[PATCH] merge consecutive 16-byte loads into one 32-byte load (PR22329)
spatel at rotateright.com
Sun Feb 1 09:49:06 PST 2015
Comment at: lib/Target/X86/X86ISelLowering.cpp:6099
@@ -6096,3 +6098,3 @@
if (isAfterLegalize &&
> If I'm reading this correctly, before this change, if we got here, then the size of VT always matched the size of the found consecutive load (VT.getSizeInBits() == EltVt.getSizeInBits() * NumElems).
> With this change, I think that no longer holds. The size of the consecutive load we find is LdVT.getSizeInBits() * Elts.size(), but there's no guarantee that this is actually the size of VT. The responsibility for ensuring this condition holds has moved to the caller.
> I think we now need an additional check that the sizes indeed match.
Previously, I think there was an implicit assumption that each of the Elts was a matching scalar load of VT.getVectorElementType(); this was safe assuming the Elts were extracted from a build_vector.
But yes, I agree. I will add a check inside the loop to confirm that each element matches the fractional size that we're expecting, and I will add a check after the loop to confirm that the cumulative size of the element loads matches the total size of the vector type (VT).
Comment at: lib/Target/X86/X86ISelLowering.cpp:13216
@@ +13215,3 @@
+ // --> load32 addr
+ if (Vec.getOpcode() == ISD::INSERT_SUBVECTOR &&
+ OpVT.is256BitVector() &&
> You probably also want to check that Idx is what you expect it to be.
Agree again - this is too lax. I will add checks to make sure that both insert_subvector indices match what we need.
More information about the llvm-commits