[PATCH] merge consecutive 16-byte loads into one 32-byte load (PR22329)
michael.m.kuperstein at intel.com
Sun Feb 1 06:37:53 PST 2015
Comment at: lib/Target/X86/X86ISelLowering.cpp:6099
@@ -6096,3 +6098,3 @@
if (isAfterLegalize &&
If I'm reading this correctly, before this change, if we got here, then the size of VT always matched the size of the found consecutive load (VT.getSizeInBits() == EltVt.getSizeInBits() * NumElems).
With this change, I think that no longer holds. The size of the consecutive load we find is LdVT.getSizeInBits() * Elts.size(), but there's no guarantee that this is actually the size of VT. The responsibility for ensuring this condition holds has moved to the caller.
I think we now need an additional check that the sizes indeed match.
Comment at: lib/Target/X86/X86ISelLowering.cpp:13216
@@ +13215,3 @@
+ // --> load32 addr
+ if (Vec.getOpcode() == ISD::INSERT_SUBVECTOR &&
+ OpVT.is256BitVector() &&
You probably also want to check that Idx is what you expect it to be.
More information about the llvm-commits