[PATCH] D112464: [x86] limit vector increment fold to allow load folding

Mon Oct 25 09:36:23 PDT 2021

spatel created this revision.
spatel added reviewers: pengfei, RKSimon, craig.topper, xbolva00.
Herald added subscribers: hiraditya, mcrosier.
spatel requested review of this revision.
Herald added a project: LLVM.

The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets).
The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples).

https://reviews.llvm.org/D112464

Files:
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/test/CodeGen/X86/combine-sub.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D112464.382022.patch
Type: text/x-patch
Size: 4429 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211025/9243bbba/attachment.bin>