[PATCH] Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]
Ahmed Bougacha
ahmed.bougacha at gmail.com
Thu Apr 23 15:34:29 PDT 2015
The combine seems sound, but ARM definitely needs to learn about special constants before this can go in.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11004
@@ +11003,3 @@
+ // Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]
+ if (InVec.getOpcode() == ISD::BUILD_VECTOR && ConstEltNo) {
+ auto Elt = InVec.getOperand(N->getConstantOperandVal(1));
----------------
Would it make sense to check that the BUILD_VECTOR .hasOneUse()?
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11010
@@ +11009,3 @@
+ // when folding this case.
+ if(Elt.getValueType() == NVT)
+ return Elt;
----------------
space before (
================
Comment at: test/CodeGen/AArch64/fold-constants.ll:6-8
@@ -5,5 +5,2 @@
; CHECK: movi d0, #0000000000000000
-; CHECK-NEXT: umov w8, v0.b[2]
-; CHECK-NEXT: sbfx w8, w8, #0, #1
-; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: fmov x0, d0
----------------
This looks fine to me, but we can do better, I think. The code looks like it's just shuffling zeros around, so I'd expect:
mov x0, xzr
================
Comment at: test/CodeGen/AArch64/fold-constants.ll:18-19
@@ -17,4 +14,4 @@
%vset_lane = insertelement <4 x i16> undef, i16 %vgetq_lane285, i32 0
%4 = bitcast <4 x i16> %vset_lane to <1 x i64>
%vget_lane = extractelement <1 x i64> %4, i32 0
ret i64 %vget_lane
----------------
This is the part where the combiners get confused. We could catch this with:
(v1i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef))
->
(v1i64 bitcast (i64 zext/sext (i16 C)))
It's not clear to me that's better though. This would be more legitimate:
(i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef))
->
(i64 zext/sext (i16 C))
But for this testcase, we'd also need:
(i64 extract_element (v1i64 bitcast N), i32 0)
->
(i64 bitcast N)
And that might be a whole 'nother mess on AArch64 (it might be very useful as well, I'm not sure).
I'll have a closer look if you don't beat me to it ;)
================
Comment at: test/CodeGen/ARM/vmov.ll:3
@@ -2,1 +2,3 @@
+; XFAIL: *
+
----------------
These failures are caused by the logic in LowerBUILD_VECTOR (recognizing splats) not firing anymore. With the combine, the vector doesn't survive until lowering, so we're stuck with a magic f64 constant that happens to have the same bitpattern as a vector splat.
As you say, ARM probably should learn about recognizing special patterns in LowerConstantFP and whatnot. That would let it materialize stuff like:
double 0x0808080808080808
using MOVI instead of constant pools (judging by the failure, I don't think it does). Feel free to file a PR and/or give it a shot; I'll try to have a look myself.
================
Comment at: test/CodeGen/R600/ds_read2.ll:220-221
@@ -223,1 +219,4 @@
+; SI: ds_read2_b32
+; SI-NOT: ds_read_b32
+; SI-NOT: ds_read_b32
; SI: s_endpgm
----------------
A single -NOT is enough
http://reviews.llvm.org/D9094
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list