[PATCH] D35638: A fix for bug33826

Wed Jul 19 14:07:51 PDT 2017

Farhana added inline comments.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:108
+    // If load size is less than Factor * ShuffleVecSize, transpose will not be
+    // not be profitable.
+    if (DL.getTypeSizeInBits(Inst->getType()) < Factor * ShuffleVecSize)
----------------
DavidKreitzer wrote:
> It is not just a question of profitability. If lowerIntoOptimizedSequence were called for a load instruction that is too small, it would make an incorrect transformation, because the decompose function would generate Factor loads, each of size ShuffleVecSize.
> 
> I would also recommend a slightly different fix. Rather than checking the expected shuffle size for both the load & the store, I would check the "expected wide vector size". For loads, that means checking the type of the load. For stores, it means checking the type of the shuffle.
> 
It is not just a question of profitability. If lowerIntoOptimizedSequence were called for a load instruction that is too small, it would make an incorrect transformation, because the decompose function would generate Factor loads, each of size ShuffleVecSize.

>>But, we could handle it easily by generating dummy vectors. And the reason we are not doing that because it will generate inefficient sequence. So, it boils down to the profitability.

https://reviews.llvm.org/D35638