[PATCH] D42051: [ARM] Avoid having to schedule a copy between CPSR and GPR in Thumb1 mode

Mon Jan 15 15:48:53 PST 2018

efriedma added a comment.

If ADDC/ADDE produced glue, the check would be mostly straightforward: the transform is safe iff no value produced by the ADDE or a node glued to the ADDE is a predecessor of the ADDC or a node glued to the ADDC.

It's possible we can clone our way out of this problem: ScheduleDAGRRList::CopyAndMoveSuccessors has code to clone nodes.  This actually already works in some cases... but not in all cases: it will clone an tADDSrr, but not a tADCS, because tADCS has a glue operand.  Maybe we can modify this code to clone tADCS nodes.  Or maybe we can add a hook to ScheduleDAGRRList::PickNodeToScheduleBottomUp to let the ARM target "copy" CPSR the way it wants to (using a tADCS/tADDSrr pair, instead of a COPY machineinstr we can't lower).  Related to this, we should probably change the Thumb1 implementation of getCrossCopyRegClass; not strictly necessary, I think, but it would be easier to debug in the future if we crash earlier in the pipeline.

I don't think the approach in this patch works.  Fundamentally, for nodes without glue, the DAG's CSE will merge them in ways which require either cross-class CPSR copies or cloning. Even patterns which look okay locally can collapse into patterns which generate an illegal COPY given DAGCombine optimizations and CSE.

Testcase follows which still fails with your patch:

  target datalayout = "e-m:e-p:64:64-i128:64-v128:64:128-a:0:64-n64-S64"
  target triple = "thumbv6---gnueabi"

  ; Function Attrs: norecurse nounwind readonly
  define i128 @a(i64* nocapture readonly %z) local_unnamed_addr #0 {
  entry:
    %0 = load i64, i64* %z, align 4
    %conv.i = zext i64 %0 to i128
    %arrayidx1 = getelementptr inbounds i64, i64* %z, i64 2
    %1 = load i64, i64* %arrayidx1, align 4
    %conv.i38 = zext i64 %1 to i128
    %shl.i39 = shl nuw i128 %conv.i38, 64
    %or = or i128 %shl.i39, %conv.i
    %arrayidx3 = getelementptr inbounds i64, i64* %z, i64 1
    %2 = load i64, i64* %arrayidx3, align 4
    %conv.i37 = zext i64 %2 to i128
    %arrayidx5 = getelementptr inbounds i64, i64* %z, i64 3
    %3 = load i64, i64* %arrayidx5, align 4
    %conv.i35 = zext i64 %3 to i128
    %shl.i36 = shl nuw i128 %conv.i35, 64
    %or7 = or i128 %shl.i36, %conv.i37
    %arrayidx10 = getelementptr inbounds i64, i64* %z, i64 4
    %4 = load i64, i64* %arrayidx10, align 4
    %conv.i64 = zext i64 %4 to i128
    %shl.i33 = shl nuw i128 %conv.i64, 64
    %or12 = or i128 %shl.i33, %conv.i
    %arrayidx15 = getelementptr inbounds i64, i64* %z, i64 5
    %5 = load i64, i64* %arrayidx15, align 4
    %conv.i30 = zext i64 %5 to i128
    %shl.i = shl nuw i128 %conv.i30, 64
    %or17 = or i128 %shl.i, %conv.i37
    %add = add i128 %or7, %or
    %add18 = add i128 %or17, %or12
    %mul = mul i128 %add18, %add
    ret i128 %mul
  }

Repository:
  rL LLVM

https://reviews.llvm.org/D42051