[PATCH] D150670: [WebAssembly] Disable generation of fshl/fshr for rotates

Thu May 18 01:15:34 PDT 2023

nikic added a comment.

In D150670#4352055 <https://reviews.llvm.org/D150670#4352055>, @pmatos wrote:

> In D150670#4351147 <https://reviews.llvm.org/D150670#4351147>, @nikic wrote:
>
>> This doesn't looks like a wasm specific problem. You get essentially the same issue on any target that has a rotate instruction but no funnel shift instruction. Here are just a couple examples: https://godbolt.org/z/8v6nfaax9
>
> Yes, I am indeed aware this is not specific to wasm. What's specific to wasm afaiu is that the code generated is much worse when expanding fshl. That's what I mentioned in the bug discussion here: https://github.com/llvm/llvm-project/issues/62703#issuecomment-1548474310
>
>> I believe this needs to be either solved by preventing demanded bits simplifications that break a rotate pattern (though I'm not sure if that would break any other optimizations we care about) or by adding a special case for this in the backend when lowering FSH to ROT.
>
> Preventing the simplification means adding target specific code in instcombine which seems even worse than adding it here given as @dschuff 
> pointed out, there's precedent with x86.

I'm not suggesting to add any target specific code to instcombine. I think there are actually quite a few different ways this could be solved. See https://llvm.godbolt.org/z/f55K7K17W for three possible representations of the same rotate pattern.

1. Say that we prefer preserving rotates over "simplifying" funnel shifts (ending up with the rot2 pattern). Basically by skipping the optimization at https://github.com/llvm/llvm-project/blob/7f54b38e28b3b66195de672848f2b5366d0d51e3/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L927-L931 if both fsh operands are the same. Assuming this doesn't cause test regressions, I think this would be acceptable to do. From a backend perspective, even for targets that have a native funnel shift (aarch64, x86), the difference between the rot1/rot2 patterns looks pretty neutral.

2. Undo the transform in the backend. It is bidrectional (https://alive2.llvm.org/ce/z/Chb85F), so this is possible.  This would need an extra legalization/combiner pattern (depending on where we form ROTs). Advantage of undoing the pattern is the usual one: Works if it was in the undesirable form in the first place. (E.g. this could happen if the rotate did not use a builtin but was implemented as `x << 8 | x >> 24`, which is probably much more widespread than the builtin. Though I checked, and we don't form a funnel shift for it in this case right now.)

3. Move the and from the fsh arguments to the result. This is the rot3 pattern. This seems to produce the best codegen on average, because it can use uxtb16 on ARM. Moving the and from args to return is a bit unusual for unary ops, but if we see this as moving two ands on both fsh arguments (which happen to be the same) to one on the result, that would be a pretty standard transform.

I think all of those options are viable, and couldn't say for certain which one is best. I think any of them would be better than making clang emit a special intrinsic just for wasm though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150670/new/

https://reviews.llvm.org/D150670