Vectorization of pointer PHI nodes

Mon Oct 14 12:03:54 PDT 2013

Yes, that looks like it is doing strided access loop vectorization (see: Auto-vectorization of interleaved data for SIMD, "http://dl.acm.org/citation.cfm?id=1133997”)

On Oct 14, 2013, at 1:53 PM, Renato Golin <renato.golin at linaro.org> wrote:

> On 14 October 2013 19:31, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> Renato, can you post the c code for the function and the assembly that gcc produces?
> 
> Attached.
> 
> 
> Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4.
> 
> I think so. Ignore the last write, it was bogus. (but don't ignore the fact that GCC vectorized it anyway with vst4!).
> 
> By running GCC with -ftree-vectorizer-verbose=1 I got:
> 
> test.c:11: note: create runtime check for data references DELTA and *WRITE_30
> test.c:11: note: create runtime check for data references *READ_29 and *WRITE_30
> test.c:11: note: created 2 versioning for alias checks.
> test.c:11: note: === vect_do_peeling_for_loop_bound ===Setting upper bound of nb iterations for epilogue loop to 14
> test.c:11: note: LOOP VECTORIZED.
> 
> The result is a very concise and very dense code:
> 
> 	vld1.8	{d28[], d29[]}, [r5]
> 	vld3.8	{d16, d18, d20}, [r9]!
> 	vld3.8	{d17, d19, d21}, [r9]
> 	vmvn 	q3, q8
> 	vmvn 	q15, q9
> 	vmvn 	q8, q10
> 	vsub.i8	q11, q3, q14
> 	vsub.i8	q12, q15, q14
> 	vsub.i8	q13, q8, q14
> 	vst3.8	{d22, d24, d26}, [r8]!
> 	vst3.8	{d23, d25, d27}, [r8]
> 
> cheers,
> --renato
> <test.c>