[PATCH] D45821: [AArch64] improve code generation of vectors smaller than 64 bit

Thu Apr 19 08:58:07 PDT 2018

sebpop created this revision.
sebpop added reviewers: eli.friedman, kristof.beyls, javed.absar, evandro.
Herald added subscribers: hiraditya, rengolin.

This changes the legalization of small vectors v2i8, v4i8, v2i16 from integer
promotion (i.e., v4i8 -> v4i16) to vector widening (i.e., v4i8 -> v8i8.)
This allows the AArch64 backend to select larger vector instructions
for middle-end vectors with fewer lanes.
In the example below, aarch64 does not have an add for v4i8;
after widening the backend is able to match that with the add for v8i8.
The widened lanes are not used in the final result, and the back-end
knows how to keep those lanes "undef"ed.

With this change we are now able to lower the cost of SLP and loop vectorization
factor from 64 bit to 16 bit.

Here is an example of SLP vectorization:

void fun(char *restrict out, char *restrict in) {

  *out++ = *in++ + 1;
  *out++ = *in++ + 2;
  *out++ = *in++ + 3;
  *out++ = *in++ + 4;

}

with this patch we now generate vector code:

fun:
	ldr	s0, [x1]
	adrp	x8, .LCPI0_0
	ldr	d1, [x8, :lo12:.LCPI0_0]
	add	v0.8b, v0.8b, v1.8b
	st1	{ v0.s }[0], [x0]
ret

when we used to generate scalar code:

fun:
	ldrb	w8, [x1]
	add	w8, w8, #1
	strb	w8, [x0]
	ldrb	w8, [x1, #2]
	add	w8, w8, #3
	ldrb	w9, [x1, #1]
	add	w9, w9, #2
	strb	w9, [x0, #1]
	strb	w8, [x0, #2]
	ldrb	w8, [x1, #3]
	add	w8, w8, #4
	strb	w8, [x0, #3]
ret


https://reviews.llvm.org/D45821

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp


Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================

--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -612,16 +612,6 @@
     return LT.first * 2 * AmortizationCost;
   }
 
-  if (Ty->isVectorTy() && Ty->getVectorElementType()->isIntegerTy(8) &&
-      Ty->getVectorNumElements() < 8) {
-    // We scalarize the loads/stores because there is not v.4b register and we
-    // have to promote the elements to v.4h.
-    unsigned NumVecElts = Ty->getVectorNumElements();
-    unsigned NumVectorizableInstsToAmortize = NumVecElts * 2;
-    // We generate 2 instructions per vector element.
-    return NumVectorizableInstsToAmortize * NumVecElts * 2;
-  }
-
   return LT.first;
 }
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===================================================================
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -97,7 +97,7 @@
   bool NegativeImmediates = true;
 
   // Enable 64-bit vectorization in SLP.
-  unsigned MinVectorRegisterBitWidth = 64;
+  unsigned MinVectorRegisterBitWidth = 16;
 
   bool UseAA = false;
   bool PredictableSelectIsExpensive = false;
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -11015,10 +11015,12 @@
 TargetLoweringBase::LegalizeTypeAction
 AArch64TargetLowering::getPreferredVectorAction(EVT VT) const {
   MVT SVT = VT.getSimpleVT();
-  // During type legalization, we prefer to widen v1i8, v1i16, v1i32  to v8i8,
-  // v4i16, v2i32 instead of to promote.
-  if (SVT == MVT::v1i8 || SVT == MVT::v1i16 || SVT == MVT::v1i32
-      || SVT == MVT::v1f32)
+  // During type legalization, we prefer to widen v1i8, v2i8, v4i8, v1i16,
+  // v2i16, v1i32, v1f32 to v8i8, v4i16, v2i32, v2f32 instead of to promote.
+  if (SVT == MVT::v1i8 || SVT == MVT::v1i16 || SVT == MVT::v1i32 ||
+      SVT == MVT::v1f32
+      || SVT == MVT::v2i8 || SVT == MVT::v4i8 || SVT == MVT::v2i16
+      )
     return TypeWidenVector;
 
   return TargetLoweringBase::getPreferredVectorAction(VT);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D45821.143100.patch
Type: text/x-patch
Size: 2317 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180419/f44dffbe/attachment.bin>