[PATCH] D136396: [X86] Enable reassociation for ADD instructions

Sat Oct 22 01:06:31 PDT 2022

RKSimon added inline comments.

================
Comment at: llvm/test/CodeGen/X86/reassociate-add.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 < %s | FileCheck %s
+
----------------
Carrot wrote:
> RKSimon wrote:
> > Drop -mcpu=x86-64
> > 
> > Pre-commit these test with current codegen to trunk and rebase to show the patch diffs.
> Test case has been sent out as https://reviews.llvm.org/D136501.
> 
>  -mcpu=x86-64 is required because in MachineCombiner the reassociation is performed either the new sequence reduce latency, or  function doSubstitute returns true. In its implementation, if -mcpu is not specified, the target will not have a valid schedule model, and then doSubstitute returns true, and reassociation is performed.
> 
> ```
> bool MachineCombiner::doSubstitute(unsigned NewSize, unsigned OldSize,
>                                    bool OptForSize) {
>   if (OptForSize && (NewSize < OldSize))
>     return true;
>   if (!TSchedModel.hasInstrSchedModelOrItineraries())
>     return true;
>   return false;
> }
> ```
OK - that makes sense!

================
Comment at: llvm/test/CodeGen/X86/reassociate-add.ll:4
+
+; This file checks the reassociation of ADD instruction.
+; The two ADD instructions add v0,v1,t2 together. t2 has a long dependence
----------------
Carrot wrote:
> craig.topper wrote:
> > Carrot wrote:
> > > spatel wrote:
> > > > RKSimon wrote:
> > > > > pengfei wrote:
> > > > > > No idea if we intended to not do reassociation in ADD instructions. 
> > > > > > Is there problem when unexpected overflow/underflow may be generated during reassociation?
> > > > > Not that I can think of - EFLAGS will be the same and we already handles the $dst=$src0 constraint
> > > > I was working on this a long time ago (~2015), so it's hard to remember exactly, but I don't think there was a fundamental reason to exclude integer ADD. 
> > > > 
> > > > It just seemed like it did not have much potential gain with the limited register set and could interfere with other transforms like LEA formation. 
> > > > 
> > > > If there's evidence that this improves something (and doesn't cause regressions), then it should be ok.
> > > EFLAGS may be different. Suppose we are adding three bytes, 250 + 10 + 10. 
> > > 
> > >   - If we add them in the order (250 + 10) + 10, the first ADD generates carry/overflow flags, the second ADD doesn't generate carry/overflow.
> > > 
> > >   - If we add them in the order (10 + 10) + 250, the first ADD doesn't generates carry/overflow flags, the second ADD generates these flags.
> > > 
> > > So we need to check if the definition of EFLAGS is dead.
> > > 
> > > Maybe this is the reason @spatel didn't add ADD instructions in 2015.
> > > 
> > > Thank @pengfei for the reminds.
> > > 
> > Scalar MUL would have the same EFLAGs issue. So hopefully we already solved it?
> Indeed. So IMULrr should be handled similar to ADD. But I can't find any code for it :(
Sorry, bad description - EFLAGS will be handled (we check they are ignored...) the same as the existing ops

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136396/new/

https://reviews.llvm.org/D136396