[PATCH] D128572: [LoongArch] Add codegen support for division operations

Sat Jun 25 01:47:41 PDT 2022

xry111 added a comment.

In D128572#3609944 <https://reviews.llvm.org/D128572#3609944>, @xen0n wrote:

> I just experimented on 3A5000 and it seems the "undefined result" is all-zeros, and unfortunately the `div.w`/`mod.w` UB when input operand is non-canonical (i.e. non-sign-extended) 32-bit is indeed present; in this case the output is all-zeros too.
>
> So the sign-extension is indeed necessary for inputs not statically known to be `signext`. The all-zeros in case of UB is less useful than RISCV's all-ones, in terms of expected semantics (we want ideally something near "infinity"), but it's UB after all, and 0 is equally okay here.

My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because the ABI has made sure that the parameters are already signed-extended.

================
Comment at: llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll:132
+; LA64-NOTRAP-NEXT:    addi.w $a0, $a0, 0
+; LA64-NOTRAP-NEXT:    div.d $a0, $a0, $a1
+; LA64-NOTRAP-NEXT:    jirl $zero, $ra, 0
----------------
xen0n wrote:
> SixWeining wrote:
> > xry111 wrote:
> > > It looks suboptimal: "div.w $a0, $a0, $a1" should work so these two sign-extensions are not needed.
> > > 
> > > I'm not sure if it's easy to optimize this.  If an optimization is not suitable for this revision, we can do it later.
> > Yes. 32bit division can be optimized to div.w but we must make sure the inputs are sign extend values. This limitation is marked in Chinese ISA document but not in the English document. Maybe the English version is outdated.
> > {F23575515}
> > 
> > ```
> > define i32 @sdiv_i32(i32 %a, i32 %b) {
> > entry:
> >   %r = sdiv i32 %a, %b
> >   ret i32 %r
> > }
> > =>
> > ; LA64-NOTRAP-NEXT:    addi.w $a1, $a1, 0
> > ; LA64-NOTRAP-NEXT:    addi.w $a0, $a0, 0
> > ; LA64-NOTRAP-NEXT:    div.w $a0, $a0, $a1
> > ```
> > 
> > ```
> > define i32 @sdiv_i32(i32 signext %a, i32 signext %b) {
> > entry:
> >   %r = sdiv i32 %a, %b
> >   ret i32 %r
> > }
> > =>
> > ; LA64-NOTRAP-NEXT:    div.w $a0, $a0, $a1
> > ```
> > 
> > Since this is an improvement to the codegen, let me implement it with seperate patch in future. Thanks.
> Indeed; however the limitation seems hugely likely a hardware erratum, because almost any other 32-bit operation on LA64 silently ignores the upper bits. I think it's better to confirm with the hardware team, in case this is another error fixed in the translation.
GCC simply generates a div.w instruction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128572/new/

https://reviews.llvm.org/D128572