[PATCH] D120428: [AArch64] Optimize safe integer division

Thu Mar 17 06:22:05 PDT 2022

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp:325
+  auto *RhsDef = MRI->getUniqueVRegDef(RhsReg);
+  auto *FlagsDef = MRI->getUniqueVRegDef(FlagsReg);
+
----------------
Kmeakin wrote:
> dmgreen wrote:
> > Kmeakin wrote:
> > > dmgreen wrote:
> > > > Isn't FlagsReg AArch64::NZCV? How does that work with getUniqueVRegDef?
> > > `FlagsReg` is indeed `NZCV`, but it is updated by flag-setting instructions (eg `SUBS`), so `getUniqueVRegDef` will return the instruction that sets the flags. 
> > Sure, but it's a physical register, not a vreg. Which was why I was surprised it worked. I was surprised it didn't assert that the register was virtual.
> > 
> > Does it work if there are multiple instructions defining nzcv in the function?
> No, it does not work if there are multiple definitions of NZCV. As far as I can tell, it is not possible to get the correct defining instruction for NZCV if it is defined by multiple instructions. 
I don't think any other parts of this pass look for the incoming NZCV at the moment (as opposed to the output uses). Sometimes these passes just needs to search backwards for the first def of NZCV it finds.

================
Comment at: llvm/test/CodeGen/AArch64/checked-int-div.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64-gnu-linux -mcpu=neoversen1 -o - | FileCheck %s
+
----------------
Kmeakin wrote:
> dmgreen wrote:
> > Kmeakin wrote:
> > > dmgreen wrote:
> > > > neoversen1 isn't a valid cpu. Does this not work in other cases due to the costs of div being too high to ifcvt?
> > > `llc -mcpu=help --mtriple=aarch64--` lists `neoversen1` as an option. Any out of order CPU should do (the cost model defaults to in order, which results in div not being ifcvted)
> > I think it would be "neoverse-n1", https://godbolt.org/z/5KEGbcfdo.
> Oh, I see now that `neoversen1` is a "cpu feature" and `neoverse-n1` is the CPU. Interestingly, specifying `-mcpu=neoverse-n1` results in the if conversion not firing, but `-mcpu=neoversen1` (or any other unrecognised CPU) does result in if conversion. 
A div usually takes quite a long time on most cpus, compared to other instructions. So unless you know that the condition is almost always true (because you almost never divide by 0 for example), the branching version might be considered better in general.

I'm not sure if there's some way to bias ifcvt to do the transform for these cases anyway? Maybe something with block probabilities or possibly special casing it in isProfitableToIfCvt? It looks like it might just hit other heuristics at the moment though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120428/new/

https://reviews.llvm.org/D120428