[PATCH] D120462: [AArch64InstrInfo.td] - Lowering fix for cttz intrinsic
Shubham Narlawar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 28 00:13:44 PST 2022
gsocshubham added a comment.
In D120462#3347925 <https://reviews.llvm.org/D120462#3347925>, @craig.topper wrote:
> In D120462#3347920 <https://reviews.llvm.org/D120462#3347920>, @gsocshubham wrote:
>
>> In D120462#3343650 <https://reviews.llvm.org/D120462#3343650>, @dmgreen wrote:
>>
>>>> TODO - Add unit test
>>>
>>> Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?
>>
>> It was intended to generate assembly as of GCC -
>>
>> f(unsigned int):
>> rbit w0, w0
>> clz w0, w0
>> and w0, w0, 31
>> ret
>>
>> but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -
>>
>> rbit w8, w0
>> cmp w0, #0
>> clz w8, w8
>> csel w0, wzr, w8, eq
>> ret
>>
>> Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -
>>
>> rbit w8, w0
>> clz w8, w8
>> and w0, w8, #0x1f
>> ret
>
> How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit
>
> rbit w8, w0
> cmp w0, #0
> clz w8, w8
> and w0, w8, #0x1f
> csel w0, wzr, w8, eq
> ret
Yes. It emits above assembly for current patch.
1. Earlier, I had submitted patch - https://reviews.llvm.org/D119010
which emits -
rbit w8, w0
clz w8, w8
and w0, w8, #0x1f
ret
which is modification of the patch submitted by original author which was generating assembly -
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
2. Hence, to solve the issue, what I have proposed is generation of assembly similar to that of GCC upstream.
3. Current patch which is https://reviews.llvm.org/D120462 - are just target dependant changes from https://reviews.llvm.org/D119010 since it was difficult there to review both intrinsic identification and AArch64InstrInfo.td changes.
> It also looks like this patch would break the codegen for
>
> define i32 @foo(i32 %x) {
> %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
> ret i32 %a
> }
>
> As you'll now add an AND that shouldn't be there.
>
> There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.
Understood. Yes, the changes causes the regression. I will work on it and also changes requested by you on other patch also.
a. What I would like to know is whether the approach I have followed i.e. to remove csel and clz changes from https://reviews.llvm.org/D119010 and generate below assembly which I am able to -
rbit w8, w0
clz w8, w8
and w0, w8, #0x1f
ret
which is quite similar to what GCC is emitting currently -
f(unsigned int):
rbit w0, w0
clz w0, w0
and w0, w0, 31
ret
b. My aim to provide fix for the issue is to generate similar assembly that is being generated by GCC AArch64 backend currently. Please suggest me if I am wrong here. Once, it is clear i.e. if aim is to generate assembly as of GCC, I will work on regression and other comments. But first, I want to validate my approach.
================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:2053
+ (ANDWri (CLZWr (RBITWr GPR32:$Rn)), (i32 4))>;
def : Pat<(cttz GPR64:$Rn),
+ (ANDXri (CLZXr (RBITXr GPR64:$Rn)), (i64 5))>;
----------------
craig.topper wrote:
> This doesn't make sense to me. If you have a plain cttz operation you don't need an AND.
>
> I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine.
Thanks for the comment. I will take a look at it.
Does generation of below assembly not acceptable?
```
rbit w8, w0
clz w8, w8
and w0, w8, #0x1f
ret
```
If not, please suggest what is the fix needed apart from identifying cttz intrinsic which this patch https://reviews.llvm.org/D113291 is already doing?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120462/new/
https://reviews.llvm.org/D120462
More information about the llvm-commits
mailing list