[PATCH] D15392: [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 10 10:12:48 PST 2015
spatel added inline comments.
================
Comment at: test/Transforms/InstCombine/bitcast-bigendian.ll:24
@@ -25,1 +23,3 @@
+; CHECK-NEXT: extractelement <2 x i32> %B, i32 1
+; CHECK-NEXT: bitcast i32 {{.*}} to float
; CHECK-NEXT: %add = fadd float %tmp24, %tmp4
----------------
hfinkel wrote:
> I'm a bit surprised by this change. We used to prefer the vector bitcast over the scalar one, and with this change, we prefer the scalar bitcast after the abstract. I can see benefits to this as a canonical form (even through some backends will need to work somewhat harder to retail good code quality: vector bitcasts are often cheaper than scalar ones). However, what happens when both elements are extracted? Do we end up with multiple scalar bitcasts?
>
I didn't think much of that difference for the existing test case, but for the case that I think you're asking about:
define float @test2(<2 x i32> %A) {
%tmp28 = bitcast <2 x i32> %A to i64
%tmp23 = trunc i64 %tmp28 to i32
%tmp24 = bitcast i32 %tmp23 to float
%tmp = bitcast <2 x i32> %A to i64
%lshr = lshr i64 %tmp, 32
%tmp2 = trunc i64 %lshr to i32
%tmp4 = bitcast i32 %tmp2 to float
%add = fadd float %tmp24, %tmp4
ret float %add
}
Yes, we'll get multiple scalar bitcasts:
define float @test2(<2 x i32> %A) {
%tmp23 = extractelement <2 x i32> %A, i32 0
%tmp24 = bitcast i32 %tmp23 to float
%tmp2 = extractelement <2 x i32> %A, i32 1
%tmp4 = bitcast i32 %tmp2 to float
%add = fadd float %tmp24, %tmp4
ret float %add
}
And the codegen for that is decidedly worse on all of PPC64+Altivec, AArch64, and x86-64 than what we used to get:
define float @test2(<2 x i32> %A) {
%1 = bitcast <2 x i32> %A to <2 x float>
%tmp24 = extractelement <2 x float> %1, i32 0
%2 = bitcast <2 x i32> %A to <2 x float>
%tmp4 = extractelement <2 x float> %2, i32 1
%add = fadd float %tmp24, %tmp4
ret float %add
}
Should I create a bitcast canonicalization instcombine to hoist bitcasts ahead of extracts?
http://reviews.llvm.org/D15392
More information about the llvm-commits
mailing list