[PATCH] merge consecutive 16-byte loads into one 32-byte load (PR22329)

Sanjay Patel spatel at rotateright.com
Sun Feb 1 09:49:06 PST 2015

Comment at: lib/Target/X86/X86ISelLowering.cpp:6099
@@ -6096,3 +6098,3 @@
     if (isAfterLegalize &&
         !DAG.getTargetLoweringInfo().isOperationLegal(ISD::LOAD, VT))
mkuper wrote:
> If I'm reading this correctly, before this change, if we got here, then the size of VT always matched the size of the found consecutive load (VT.getSizeInBits() == EltVt.getSizeInBits() * NumElems). 
> With this change, I think that no longer holds. The size of the consecutive load we find is LdVT.getSizeInBits() * Elts.size(), but there's no guarantee that this is actually the size of VT. The responsibility for ensuring this condition holds has moved to the caller.
> I think we now need an additional check that the sizes indeed match.
Previously, I think there was an implicit assumption that each of the Elts was a matching scalar load of VT.getVectorElementType(); this was safe assuming the Elts were extracted from a build_vector.

But yes, I agree. I will add a check inside the loop to confirm that each element matches the fractional size that we're expecting, and I will add a check after the loop to confirm that the cumulative size of the element loads matches the total size of the vector type (VT).

Comment at: lib/Target/X86/X86ISelLowering.cpp:13216
@@ +13215,3 @@
+  // --> load32 addr
+  if (Vec.getOpcode() == ISD::INSERT_SUBVECTOR &&
+      OpVT.is256BitVector() &&
mkuper wrote:
> You probably also want to check that Idx is what you expect it to be.
Agree again - this is too lax. I will add checks to make sure that both insert_subvector indices match what we need.



More information about the llvm-commits mailing list