[PATCH] D59535: [SelectionDAG] Compute known bits of CopyFromReg

Mon Apr 8 02:27:10 PDT 2019

dmgreen added a comment.

Hello

We are seeing a lot of codesize increases and performance changes from this patch, unfortunately. Consider this code, which I believe is fairly common in an embedded context:

  target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
  target triple = "thumbv6m-arm-none-eabi"

  ; Function Attrs: minsize nounwind optsize
  define dso_local i32 @C(i32 %x, i32* nocapture %y) local_unnamed_addr #0 {
  entry:
    br label %for.cond

  for.cond:                                         ; preds = %B.exit, %entry
    %i.0 = phi i32 [ 0, %entry ], [ %inc, %B.exit ]
    %exitcond = icmp eq i32 %i.0, 128
    br i1 %exitcond, label %for.end, label %for.body

  for.body:                                         ; preds = %for.cond
    %mul = shl i32 %i.0, 2
    %add = add i32 %mul, %x
    store volatile i32 0, i32* inttoptr (i32 1342226444 to i32*), align 4
    store volatile i32 %add, i32* inttoptr (i32 1342226436 to i32*), align 4
    store volatile i32 1, i32* inttoptr (i32 1342226448 to i32*), align 16
    tail call void @llvm.arm.isb(i32 15) #1
    br label %while.cond.i

  while.cond.i:                                     ; preds = %while.cond.i, %for.body
    %0 = load volatile i32, i32* inttoptr (i32 1342226448 to i32*), align 16
    %tobool.i = icmp eq i32 %0, 0
    br i1 %tobool.i, label %B.exit, label %while.cond.i

  B.exit:                                           ; preds = %while.cond.i
    %1 = load volatile i32, i32* inttoptr (i32 1342226440 to i32*), align 8
    %arrayidx = getelementptr inbounds i32, i32* %y, i32 %i.0
    store i32 %1, i32* %arrayidx, align 4
    %inc = add nuw nsw i32 %i.0, 1
    br label %for.cond

  for.end:                                          ; preds = %for.cond
    ret i32 0
  }

  ; Function Attrs: nounwind
  declare void @llvm.arm.isb(i32) #1

  attributes #0 = { minsize nounwind optsize "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-m0plus" "target-features"="+armv6-m,+strict-align,+thumb-mode,-crc,-dotprod,-dsp,-fp16fml,-hwdiv,-hwdiv-arm,-ras" "unsafe-fp-math"="false" "use-soft-float"="false" }
  attributes #1 = { nounwind }

Before this patch, constant hoist would take those constants addresses, which refer to memory mapped registers/IO, and turns them into a base constant plus a number of inttoptr casts. That way there is only one base, plus a number of offsets that are turned into STR rn, [base, offset].

With this new patch, it just sees straight through the CopyFromReg though! (Technically, it turns an ADD to an OR, to a constant). So we end up with multiple bases, each have to be materialised from a constant pool.

There are other things that constant hoist and CGP to in a similar manor, to create base+offset for architectures like Thumb where the offset range is fairly small and materialising constants is not always easy.

So although I like the idea of this patch, it seems to be breaking some optimisations that are relying on this not happening. Why we don't have tests that show any of that, I'm not sure..

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59535/new/

https://reviews.llvm.org/D59535