[PATCH] D45748: [RISCV] Separate base from offset in lowerGlobalAddress

Sameer AbuAsal via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 16 10:26:57 PDT 2018


sabuasal added a comment.

In https://reviews.llvm.org/D45748#1099689, @asb wrote:

> Thanks for the update Sameer. I've been evaluating this patch today. I was curious about the decision to perform a peephole rather than a DAG combine. You mention that the DAG combiner doesn't work for uses spanning multiple basic blocks, but doesn't the SelectionDAG (and hence RISCVISelDAGToDAG) also work on a per-basic-block granularity?


Yes. Here is the problem with DAG combine:

- In our optimization we lower the Global address in a target global address and an ISD::ADD (a generic node, not target specific node)
- We create a DAG combine that does a call back for ISD::ADD.
- Right after the lowerGlobalAddress is done you hit the DAG combiner function. Now at this point other blocks are yet to be processed and the global address there is still not lowered into a TargetGlobalAddress, in other blocks (as listed in the the IR test I added called "control_flow") so the Target node we created only has one use only. Our combiner check will go back and put the offset back in the target lowering because it simply doesn't know.

  In a peephole, I am only checking the target specific RISCV::ADD and RISCV::ADDI,   so my check only kicks in and merges the offset back in to the global address after ALL the blocks have been visited.



  This would have been doable by DAG combiner, but the problem is you cant request a call back for a TargetSpecific node maybe but the framework doesn't let you do that.

> The following IR demonstrates a missed opportunity for rewriting a global to be global+offset and deleting an addi. Specifically you end up with lui+addi+lh, which can be reduced to lui+lh by merging the offset back into the GlobalAddress and using that in the lh, thus allowing the addi to be deleted.
> 
>   @foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0], align 2                                     
>                                                                                                                   
>   define i32 @main() nounwind {                                                                                   
>   entry:                                                                                                          
>     %0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]* @foo, i32 0, i32 4), align 2                
>     %cmp = icmp eq i16 %0, 140                                                                                    
>     br i1 %cmp, label %if.end, label %if.then                                                                     
>                                                                                                                   
>   if.then:                                                                                                        
>     tail call void @abort()                                                                                       
>     unreachable                                                                                                   
>                                                                                                                   
>   if.end:                                                                                                         
>     ret i32 0                                                                                                     
>   }                                                                                                               
>                                                                                                                   
>   declare void @abort() 
> 
> 
> A quick and dirty addition to doPeepholeLoadStoreADDI can resolve this (intended to be demonstrative rather than commitable code):
> 
>    // Merge an ADDI into the offset of a load/store instruction where possible.
>    // (load (add base, off), 0) -> (load base, off)
>    // (store val, (add base, off)) -> (store val, base, off)
>   @@ -171,19 +375,51 @@ void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() {
>          break;
>        }
>    
>   -    // Currently, the load/store offset must be 0 to be considered for this
>   -    // peephole optimisation.
>   -    if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)) ||
>   -        N->getConstantOperandVal(OffsetOpIdx) != 0)
>   -      continue;
>   -
>        SDValue Base = N->getOperand(BaseOpIdx);
>    
>        // If the base is an ADDI, we can merge it in to the load/store.
>        if (!Base.isMachineOpcode() || Base.getMachineOpcode() != RISCV::ADDI)
>          continue;
>    
>   +    if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)))
>   +     continue;
>   +   
>   +    uint64_t Offset = N->getConstantOperandVal(OffsetOpIdx);
>        SDValue ImmOperand = Base.getOperand(1);
>   +    SDValue HBase = Base.getOperand(0);
>   +
>   +    // Base is addi, offset is non-zero, addi has a global parameter, addi 
>   +    // parent is lui, addi+lui have only one use then do the global match.
>   +    if (Offset != 0 && isa<GlobalAddressSDNode>(ImmOperand) && 
>   +        HBase.isMachineOpcode() && HBase.getMachineOpcode() == RISCV::LUI && 
>   +        isa<GlobalAddressSDNode>(HBase.getOperand(0)) && Base.hasOneUse() && 
>   +          HBase.hasOneUse()) {
>   +      auto GALo = cast<GlobalAddressSDNode>(ImmOperand);
>   +      // Change the ImmOperand to be a global address with new offset
>   +      SDValue GALoNew = CurDAG->getTargetGlobalAddress(
>   +          GALo->getGlobal(), SDLoc(ImmOperand), ImmOperand.getValueType(),
>   +          GALo->getOffset() + Offset, GALo->getTargetFlags());
>   +      // Change the HBase immoperand to be a global address with new offset
>   +      auto GAHi = cast<GlobalAddressSDNode>(HBase.getOperand(0));
>   +      SDValue GAHiNew = CurDAG->getTargetGlobalAddress(
>   +          GAHi->getGlobal(), SDLoc(ImmOperand), ImmOperand.getValueType(),
>   +          GAHi->getOffset() + Offset, GAHi->getTargetFlags());
>   +      if (BaseOpIdx == 0) // Load
>   +        CurDAG->UpdateNodeOperands(N, Base.getOperand(0), GALoNew,
>   +                                 N->getOperand(2));
>   +      else // Store
>   +        CurDAG->UpdateNodeOperands(N, N->getOperand(0), Base.getOperand(0),
>   +            GALoNew, N->getOperand(3));
>   +      CurDAG->UpdateNodeOperands(HBase.getNode(), GAHiNew);
>   +      // Remove dead nodes
>   +      CurDAG->RemoveDeadNodes();
>   +      continue;
>   +    }
>   +
>   +    // Currently, the load/store offset must be 0 to be considered for this
>   +    // peephole optimisation.
>   +    if (Offset != 0)
>   +      continue;
>    
>        if (auto Const = dyn_cast<ConstantSDNode>(ImmOperand)) {
>          ImmOperand = CurDAG->getTargetConstant(
> 
> 
> But I do see some cases where this introduces an extra instruction due to the basic-block granularity, as the global base is no longer reused across basic blocks. Hopefully this is all much easier to handle with GlobalISel.




https://reviews.llvm.org/D45748





More information about the llvm-commits mailing list