[PATCH] D14064: Convert Returned Constant i1 Values to i32 on PPC64
hfinkel@anl.gov via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 10 07:58:48 PST 2015
hfinkel added a comment.
In http://reviews.llvm.org/D14064#276116, @hfinkel wrote:
> ...
>
> As a general comment, I'd make this a bit more general. The core problem, as you know, if that the logic that takes care of eliminating unnecessary round-tripping through crbit regs is in PPCISelLowering, and thus is BB local. Using crbits makes a lot of sense only when some input comes from a comparison, or feeds a conditional branch. If none of the i1 values comes from a comparison, and none feeds a branch, then it seems likely we'll have unnecessary round-tripping through crbit regs.
>
> Thus, it seems like you could start with all i1 consumers that are not selects or conditional branches (returns, function arguments, [zs]exts), and build a tree of defines all the way back to the initial producers. Unless one of the producers is a comparison, then promote everything to i32.
I feel like you've not addressed this, and this pass is still much too focused on a particular special case. While I'm generally fine with incremental improvement, the comments in the code should at least explain why this particular special case is worth handling in isolation.
I think this is justifiable because handling more-complex cases might require additional logic to take them out of canonical form. To provide a quick example, consider this:
$ cat /tmp/f.cpp
bool test1(bool a, bool b, bool c, bool d) {
return a && b && c && d;
}
but as we canonicalize this, we get:
$ clang++ -O3 -S -emit-llvm -o - /tmp/f.cpp
; ModuleID = '/tmp/f.cpp'
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"
; Function Attrs: nounwind readnone
define zeroext i1 @_Z5test1bbbb(i1 zeroext %a, i1 zeroext %b, i1 zeroext %c, i1 zeroext %d) #0 {
entry:
%brmerge.demorgan = and i1 %a, %b
br i1 %brmerge.demorgan, label %land.lhs.true.5, label %land.end
land.lhs.true.5: ; preds = %entry
%d. = and i1 %c, %d
ret i1 %d.
land.end: ; preds = %entry
ret i1 false
}
and the code we generate, using crbits or not, is pretty bad:
$ clang++ -O3 -S -o - /tmp/f.cpp
...
.Lfunc_begin0:
# BB#0: # %entry
andi. 6, 6, 1
and 3, 3, 4
crmove 20, 1
andi. 5, 5, 1
crmove 21, 1
andi. 3, 3, 1
bc 4, 1, .LBB0_3
# BB#1: # %land.lhs.true.5
crand 20, 21, 20
li 3, 1
bclr 12, 20, 0
# BB#2: # %land.lhs.true.5
li 3, 0
blr
.LBB0_3: # %land.end
li 3, 0
blr
but this is still not good:
$ clang++ -O3 -S -o - /tmp/f.cpp -mno-crbits
...
.Lfunc_begin0:
# BB#0: # %entry
cmplwi 0, 3, 0
beq 0, .LBB0_3
# BB#1: # %entry
cmplwi 0, 4, 0
beq 0, .LBB0_3
# BB#2: # %land.lhs.true.5
and 3, 5, 6
clrldi 3, 3, 32
blr
.LBB0_3: # %land.end
li 3, 0
blr
so we have a lot of work to do on this front.
I will point out, however, that:
$ cat /tmp/f2.cpp
bool test1(bool a, bool b, bool c, bool d) {
return a & b & c & d;
}
$ clang++ -O3 -S -o - /tmp/f2.cpp
...
.Lfunc_begin0:
# BB#0: # %entry
and 3, 3, 4
and 3, 3, 5
and 3, 3, 6
blr
and that is what we want.
================
Comment at: lib/Target/PowerPC/PPCTargetMachine.cpp:286
@@ -285,1 +285,3 @@
+ if (TM->getOptLevel() != CodeGenOpt::None)
+ addPass(createBoolRetToIntPass());
addPass(createAtomicExpandPass(&getPPCTargetMachine()));
----------------
kbarton wrote:
> This placement will make this one of the first passes that gets runs, won't it? Is this where we want to run it? I admit I haven't thought through the different placements for this, but for some reason I was expecting it to run later (not sure why).
This seems like a reasonable place for it. We probably do want it before CodeGenPrep runs, and I see no particular reason for it to be later.
http://reviews.llvm.org/D14064
More information about the llvm-commits
mailing list