[PATCH] D153197: [AVR] Expand shifts during AVRISelLowering

Sat Jun 24 05:41:16 PDT 2023

Patryk27 added a comment.

Now that I think about it, there //might// be a simpler way!

So, the underlying issue is that sometimes LLVM spawns new shifts during the instruction selection pass - for instance given this IR¹:

  define i64 @test(i64 %x, i32 %y) {
  start:
    %0 = or i32 %y, 38
    %1 = zext i32 %0 to i64
    %2 = lshr i64 %x, %1
    ret i64 %2
  }

... this 64-bit shift will not get considered by our shift-expansion pass due to a condition here:

https://github.com/llvm/llvm-project/blob/eaaacc3c651e5b2c23bfa9648b6b0d69aab64d00/llvm/lib/Target/AVR/AVRShiftExpand.cpp#L54

... but later, during isel, LLVM **will** reduce this 64-bit shift into a 32-bit shift² and then ask us to expand this brand new 32-bit shift - that causes a panic since we expect those to have been already expanded.

In order to solve this issue, I think we could simply expand //all// variable-shifts greater than i8 (instead of expanding only 32-bit shifts):

  diff --git a/llvm/lib/Target/AVR/AVRShiftExpand.cpp b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  index b7dcd860467d..4ea6d9fdb57c 100644
  --- a/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  +++ b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  @@ -51,8 +51,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
       if (!I.isShift())
         // Only expand shift instructions (shl, lshr, ashr).
         continue;
  -    if (I.getType() != Type::getInt32Ty(Ctx))
  -      // Only expand plain i32 types.
  +    if (I.getType() == Type::getInt8Ty(Ctx))
         continue;
       if (isa<ConstantInt>(I.getOperand(1)))
         // Only expand when the shift amount is not known.
  @@ -75,7 +74,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
   void AVRShiftExpand::expand(BinaryOperator *BI) {
     auto &Ctx = BI->getContext();
     IRBuilder<> Builder(BI);
  -  Type *Int32Ty = Type::getInt32Ty(Ctx);
  +  Type *InputTy = cast<Instruction>(BI)->getType();
     Type *Int8Ty = Type::getInt8Ty(Ctx);
     Value *Int8Zero = ConstantInt::get(Int8Ty, 0);

  @@ -101,7 +100,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     Builder.SetInsertPoint(LoopBB);
     PHINode *ShiftAmountPHI = Builder.CreatePHI(Int8Ty, 2);
     ShiftAmountPHI->addIncoming(ShiftAmount, BB);
  -  PHINode *ValuePHI = Builder.CreatePHI(Int32Ty, 2);
  +  PHINode *ValuePHI = Builder.CreatePHI(InputTy, 2);
     ValuePHI->addIncoming(BI->getOperand(0), BB);

     // Subtract the shift amount by one, as we're shifting one this loop
  @@ -116,13 +115,13 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     Value *ValueShifted;
     switch (BI->getOpcode()) {
     case Instruction::Shl:
  -    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(Int32Ty, 1));
  +    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(InputTy, 1));
       break;
     case Instruction::LShr:
  -    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
  +    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(InputTy, 1));
       break;
     case Instruction::AShr:
  -    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
  +    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(InputTy, 1));
       break;
     default:
       llvm_unreachable("asked to expand an instruction that is not a shift");
  @@ -137,7 +136,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     // Collect the resulting value. This is necessary in the IR but won't produce
     // any actual instructions.
     Builder.SetInsertPoint(BI);
  -  PHINode *Result = Builder.CreatePHI(Int32Ty, 2);
  +  PHINode *Result = Builder.CreatePHI(InputTy, 2);
     Result->addIncoming(BI->getOperand(0), BB);
     Result->addIncoming(ValueShifted, LoopBB);

Overall, this solution seems to work - I've checked it on a couple of Rust applications and the binaries behave correctly, both some simple and more complex ones.

I think the only disadvantage of this approach, as compared to expanding shifts during isel, are optimizations: expanding shifts eagerly means that we'll lose some of the optimizations we could have applied otherwise.

For instance, following that first example from Rust's standard library, we will expand that instruction into a 64-bit shift even though in principle a 32-bit shift would suffice (but we don't know that yet during shift-expansion pass).

For safety, we could implement this simpler approach first, as presented in the diff here, and maybe come back to merging shift-expansion with isel in the future - what do you think about it?

¹ minimized case from Rust's standard library - originally: `_ZN4core3num7dec2flt6lemire13compute_float17hc1d4de6247502c96E`
² I'm not 100% sure why, though - seems to be somehow related to this `or + zext` combination

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153197/new/

https://reviews.llvm.org/D153197