[PATCH] D16836: [CodeGenPrepare] Remove load-based heuristic

Fri Feb 19 09:36:25 PST 2016

spatel added a comment.

Junmo - your patch will cause these existing regression tests to fail:

  LLVM :: CodeGen/AArch64/a57-csel.ll
  LLVM :: CodeGen/X86/cmov-into-branch.ll
  LLVM :: Transforms/CodeGenPrepare/X86/select.ll

Please update the check-lines in those tests and add those diffs to this patch. The existing tests cover the functionality that you are proposing to change, so I don't think there is a need to add a new test file.

We've covered a lot of ground in this review, so let me try to summarize:

1. http://reviews.llvm.org/rL156234 - this added a TLI (isFormingBranchFromSelectProfitable) and heuristic (single-use load) based transform that would convert a select into a branch. It improved test-suite and possibly other benchmarks running on x86 OoO (likely SandyBridge at the time).
2. Benjamin noted some of the potential improvements in the commit message and again earlier in this review. I filed https://llvm.org/bugs/show_bug.cgi?id=26636 to track one of those suggestions.
3. Junmo proposed limiting the transform based on "free" operands because this improved 2 tests in test-suite for ARM.
4. I was concerned that we didn't have the right target information at this level, so we might harm in-order cores or GPUs, but that was wrong. I fixed the x86 predicate in http://reviews.llvm.org/rL261275 . The ARM version is a bigger mess (it won't be NFC), so I'll let someone who knows that backend better make the identical change for that target.
5. Current test-suite data from SandyBridge, Jaguar, and Cortex-A57 show an improvement from *not* using the select-to-branch transform with loads.
6. Therefore, this patch has been updated to effectively revert r156234.
7. There is concern that this will hurt some cases in EEMBC or other benchmarks running on x86.
8. There is general agreement that if this patch is committed, but we want to resurrect/improve the select-to-branch transform, it should be done after isel. This is because adding branches to IR limits isel/scheduling effectiveness because those are currently limited to basic block scope.
9. If we had global isel, that limitation would be removed, so we would probably recreate the select-to-branch transform here in CGP again because it's easier to work on IR than MI.

Did I miss anything? :)
I would like to hear from Zia and Mitch before proceeding with this patch because of #7. It would be great to have more perf data. But like I said earlier in the thread, I don't know what our policy is in a situation where a heuristic has lost value for test-suite but still has benefits for other cases.

http://reviews.llvm.org/D16836