[PATCH] D118979: [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth

Tue Apr 5 02:30:50 PDT 2022

dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks for the update. This LGTM.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:39

+static cl::opt<bool> AArch64MaximizeBandwidth(
+    "aarch64-vectorizer-maximize-bandwidth", cl::init(true), cl::Hidden,
----------------
jaykang10 wrote:
> paulwalker-arm wrote:
> > dmgreen wrote:
> > > There is a vectorizer-maximize-bandwidth option is the vectorizer that can override the target option for shouldMaximizeVectorBandwidth. I don't think adding an aarch64 option is necessary, can you remove it?
> > I agree.  My original ask was because I thought there were concerns about enabling this by default.  Given the flag still defaults to on and it seems we're happy to make this change for AArch64 I retract my previous ask.
> Yep, let me remove this option.
Yeah - I think the vectorizer-maximize-bandwidth option worked differently back when the comment was suggested too.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-illegal-type.ll:90
 ; CHECK: store i1 %[[EXTRACT1]], i1* %dst
-; CHECK: %[[EXTRACT2:.*]] = extractelement <2 x i1> %[[ICMP]], i32 1
+; CHECK: %[[EXTRACT2:.*]] = extractelement <64 x i1> %[[ICMP]], i32 1
 ; CHECK: store i1 %[[EXTRACT2]], i1* %dst
----------------
jaykang10 wrote:
> jaykang10 wrote:
> > dmgreen wrote:
> > > This is worrying - should it be vectorizing 64x for in i1 type! (and are there a lot of other extracts now)?
> > When I checked it, it looked the dagcombiner combines the 64 times i1 extract_vector_elt and store nodes to one 64 bit store node.
> > Let me check it again.
> um... in this test, the `%dst` is passed as parameter so it is not changed in the loop. Therefore, the last element of <64 x i1>vector needs to be stored. It looks dagcombiner catches it and optimizes the nodes well. The assembly output of `vector.body` block from llc is as below. It looks ok.
> ```
> .LBB0_3:                                // %vector.body
>                                         // =>This Inner Loop Header: Depth=1
> 	dup	v2.2d, x12
> 	add	x12, x12, #512
> 	subs	x11, x11, #64
> 	add	v2.2d, v2.2d, v0.2d
> 	cmeq	v2.2d, v2.2d, v1.2d
> 	xtn2	v2.4s, v2.2d
> 	xtn2	v2.8h, v2.4s
> 	xtn	v2.8b, v2.8h
> 	umov	w13, v2.b[7]
> 	and	w13, w13, #0x1
> 	strb	w13, [x0]
> 	b.ne	.LBB0_3
> ```
Ah I see, that makes sense that it would pick the higher factor then. It looks like if the address is varying it does not vectorize.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118979/new/

https://reviews.llvm.org/D118979