[PATCH] D31965: [SLP] Enable 64-bit wide vectorization for Cyclone

Wed Apr 12 07:19:24 PDT 2017

anemet added a comment.

Hi Renato,

In https://reviews.llvm.org/D31965#724619, @rengolin wrote:

> Hi Adam,
>
> Interesting results! But it doesn't sound like this is Cyclone specific.

Sure it's not, it is just a deployment strategy for this change.  See the FIXME in the code.

Rolling it out for Cyclone-only is just a way to get this going in a controllable manner.  Other subtargets can roll it this out as people find the time to benchmark and tune this.

As the results section shows I had and still doing some tuning on this.  This mostly allows 2-lane vectorization for 32-bit types so they benefit of vectorization is not so great thus the accuracy of the cost model is really tested by enabling this.

> @kristof.beyls Can you check on A57?

That would be great.  Thanks!

Adam

> cheers,
> --renato

================
Comment at: include/llvm/Analysis/TargetTransformInfoImpl.h:306

+  unsigned getMinVectorRegisterBitWidth() { return 128; }
+
----------------
sdardis wrote:
> rengolin wrote:
> > Is this value really the best default to all targets?
> My quick survey of vector register widths suggests this is double the minimum.
> 
> SPARC's VIS extension uses the double precision floating point register set (64 bits wide) , as does Intel's MMX, MIPS' MIPS-3D (though currently unimplemented in LLVM).
> 
> The S/390 vector registers appear to be 128 bits, like the basic Intel SSE, MIPS MSA, ARM NEON, PowerPC Altivec.
This does not change change the default from SLP.  It just brings it to TTI so that targets can change it as they see fit after careful benchmarking (it will need careful benchmarking!).  It is *not* the goal of this patch to find a new proper default across targets.

================
Comment at: lib/Target/AArch64/AArch64Subtarget.cpp:62
     MaxPrefetchIterationsAhead = 3;
+    // Enable 64-bit vectorization in SLP.
+    MinVectorRegisterBitWidth = 64;
----------------
rengolin wrote:
> Is this really Cyclone specific? ?Have you benchmarked on other cores?
See above. It's not Cyclone-specific, it is just a deployment strategy to only enable for Cyclone since I have no access to other cores.

https://reviews.llvm.org/D31965