[PATCH] D59535: [SelectionDAG] Compute known bits of CopyFromReg
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 8 02:27:10 PDT 2019
dmgreen added a comment.
Hello
We are seeing a lot of codesize increases and performance changes from this patch, unfortunately. Consider this code, which I believe is fairly common in an embedded context:
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-arm-none-eabi"
; Function Attrs: minsize nounwind optsize
define dso_local i32 @C(i32 %x, i32* nocapture %y) local_unnamed_addr #0 {
entry:
br label %for.cond
for.cond: ; preds = %B.exit, %entry
%i.0 = phi i32 [ 0, %entry ], [ %inc, %B.exit ]
%exitcond = icmp eq i32 %i.0, 128
br i1 %exitcond, label %for.end, label %for.body
for.body: ; preds = %for.cond
%mul = shl i32 %i.0, 2
%add = add i32 %mul, %x
store volatile i32 0, i32* inttoptr (i32 1342226444 to i32*), align 4
store volatile i32 %add, i32* inttoptr (i32 1342226436 to i32*), align 4
store volatile i32 1, i32* inttoptr (i32 1342226448 to i32*), align 16
tail call void @llvm.arm.isb(i32 15) #1
br label %while.cond.i
while.cond.i: ; preds = %while.cond.i, %for.body
%0 = load volatile i32, i32* inttoptr (i32 1342226448 to i32*), align 16
%tobool.i = icmp eq i32 %0, 0
br i1 %tobool.i, label %B.exit, label %while.cond.i
B.exit: ; preds = %while.cond.i
%1 = load volatile i32, i32* inttoptr (i32 1342226440 to i32*), align 8
%arrayidx = getelementptr inbounds i32, i32* %y, i32 %i.0
store i32 %1, i32* %arrayidx, align 4
%inc = add nuw nsw i32 %i.0, 1
br label %for.cond
for.end: ; preds = %for.cond
ret i32 0
}
; Function Attrs: nounwind
declare void @llvm.arm.isb(i32) #1
attributes #0 = { minsize nounwind optsize "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-m0plus" "target-features"="+armv6-m,+strict-align,+thumb-mode,-crc,-dotprod,-dsp,-fp16fml,-hwdiv,-hwdiv-arm,-ras" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }
Before this patch, constant hoist would take those constants addresses, which refer to memory mapped registers/IO, and turns them into a base constant plus a number of inttoptr casts. That way there is only one base, plus a number of offsets that are turned into STR rn, [base, offset].
With this new patch, it just sees straight through the CopyFromReg though! (Technically, it turns an ADD to an OR, to a constant). So we end up with multiple bases, each have to be materialised from a constant pool.
There are other things that constant hoist and CGP to in a similar manor, to create base+offset for architectures like Thumb where the offset range is fairly small and materialising constants is not always easy.
So although I like the idea of this patch, it seems to be breaking some optimisations that are relying on this not happening. Why we don't have tests that show any of that, I'm not sure..
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59535/new/
https://reviews.llvm.org/D59535
More information about the llvm-commits
mailing list