[PATCH] D120648: [DAGCombine] fold (bswap(srl (bswap c), 8x)) -> (shl c, 8x)

Mon Mar 28 01:25:20 PDT 2022

Chenbing.Zheng abandoned this revision.
Chenbing.Zheng added a comment.
Herald added a subscriber: StephenFan.

In D120648#3395069 <https://reviews.llvm.org/D120648#3395069>, @spatel wrote:

> In D120648#3393774 <https://reviews.llvm.org/D120648#3393774>, @Chenbing.Zheng wrote:
>
>> I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x)  and it may provide more opportunity for optimization;
>> but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?
>
> We want to have the more flexible/robust pattern-matching because it will handle cases that may not be obvious/visible yet. By making the minimal transform, we try to ensure that the code will be in a "canonical" form that other transforms can then act on. (That is also why I propose adding the transform in IR in D122010 <https://reviews.llvm.org/D122010>.)
>
> In this case, the key optimization is that a bswap-of-bswap (or bitreverse-of-bitreverse) cancels itself out. That optimization already exists, so we don't need to reimplement it here and other places. Just try to get the code into a form that will allow the existing code to apply. This is a fundamental feature of iterative combining. It may seem less efficient, but it's actually more efficient because we don't need to duplicate as much combiner code to get the same level of consistent optimization.
>
> For example, we do not have this test in D121504 <https://reviews.llvm.org/D121504> (but I think it should be included):
>
>   define i32 @test_bswap_shli_8_bswap_i32(i32 %a) nounwind {
>       %1 = call i32 @llvm.bswap.i32(i32 %a)
>       %2 = shl i32 %1, 8
>       %3 = call i32 @llvm.bswap.i32(i32 %2)
>       ret i32 %3
>   }
>
> We could adjust this patch to also deal with the opposite direction shift, but then it will need more redundant pattern-matching code (especially if we need to duplicate it again for bitreverse).
>
> I didn't step through it, but that might be why this patch fails to make some "rev16" optimizations for ARM code that we will get with the transform that I suggested.

Sorry for my delayed reply，I think what you said makes sense. I will abundant this patch and I have update D121504 <https://reviews.llvm.org/D121504> according to your suggestion. Would you mind review it and creat your patch to solve it.
There is a similar problem with bitreverse-shift, and I add tests in D121507 <https://reviews.llvm.org/D121507>.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120648/new/

https://reviews.llvm.org/D120648

[PATCH] D120648: [DAGCombine] fold (bswap(srl (bswap c), 8*x)) -> (shl c, 8*x)

[PATCH] D120648: [DAGCombine] fold (bswap(srl (bswap c), 8x)) -> (shl c, 8x)