[PATCH] D120462: [AArch64InstrInfo.td] - Lowering fix for cttz intrinsic

Mon Feb 28 00:13:44 PST 2022

gsocshubham added a comment.

In D120462#3347925 <https://reviews.llvm.org/D120462#3347925>, @craig.topper wrote:

> In D120462#3347920 <https://reviews.llvm.org/D120462#3347920>, @gsocshubham wrote:
>
>> In D120462#3343650 <https://reviews.llvm.org/D120462#3343650>, @dmgreen wrote:
>>
>>>> TODO - Add unit test
>>>
>>> Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?
>>
>> It was intended to generate assembly as of GCC -
>>
>>   f(unsigned int):
>>           rbit    w0, w0
>>           clz     w0, w0
>>           and     w0, w0, 31
>>           ret
>>
>> but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -
>>
>>   rbit w8, w0
>>   cmp w0, #0
>>   clz w8, w8
>>   csel w0, wzr, w8, eq
>>   ret
>>
>> Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -
>>
>>   rbit    w8, w0
>>   clz     w8, w8
>>   and     w0, w8, #0x1f
>>   ret
>
> How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit
>
>   rbit w8, w0
>   cmp w0, #0
>   clz w8, w8
>   and     w0, w8, #0x1f
>   csel w0, wzr, w8, eq
>   ret

Yes. It emits above assembly for current patch.

1. Earlier, I had submitted patch - https://reviews.llvm.org/D119010

which emits -

  rbit    w8, w0
  clz     w8, w8
  and     w0, w8, #0x1f
  ret

which is modification of the patch submitted by original author which was generating assembly -

  rbit w8, w0
  cmp w0, #0
  clz w8, w8
  csel w0, wzr, w8, eq
  ret

2. Hence, to solve the issue, what I have proposed is generation of assembly similar to that of GCC upstream.

3. Current patch which is https://reviews.llvm.org/D120462 - are just target dependant changes from https://reviews.llvm.org/D119010 since it was difficult there to review both intrinsic identification and AArch64InstrInfo.td changes.

> It also looks like this patch would break the codegen for
>
>   define i32 @foo(i32 %x) {
>     %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
>     ret i32 %a
>   }
>
> As you'll now add an AND that shouldn't be there.
>
> There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.

Understood. Yes, the changes causes the regression. I will work on it and also changes requested by you on other patch also.

a. What I would like to know is whether the approach I have followed i.e. to remove csel and clz changes from https://reviews.llvm.org/D119010 and generate below assembly which I am able to -

  rbit    w8, w0
  clz     w8, w8
  and     w0, w8, #0x1f
  ret

which is quite similar to what GCC is emitting currently -

  f(unsigned int):
          rbit    w0, w0
          clz     w0, w0
          and     w0, w0, 31
          ret

b. My aim to provide fix for the issue is to generate similar assembly that is being generated by GCC AArch64 backend currently. Please suggest me if I am wrong here. Once, it is clear i.e. if aim is to generate assembly as of GCC, I will work on regression and other comments. But first, I want to validate my approach.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:2053
+          (ANDWri (CLZWr (RBITWr GPR32:$Rn)), (i32 4))>;
 def : Pat<(cttz GPR64:$Rn),
+          (ANDXri (CLZXr (RBITXr GPR64:$Rn)), (i64 5))>;
----------------
craig.topper wrote:
> This doesn't make sense to me. If you have a plain cttz operation you don't need an AND.
> 
> I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine.
Thanks for the comment. I will take a look at it.

Does generation of below assembly not acceptable?

```
        rbit    w8, w0
        clz     w8, w8
        and     w0, w8, #0x1f
        ret

```

If not, please suggest what is the fix needed apart from identifying cttz intrinsic which this patch https://reviews.llvm.org/D113291 is already doing?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120462/new/

https://reviews.llvm.org/D120462