[llvm] r200213 - [vectorizer] Teach the loop vectorizer's unroller to only unroll by

Mon Feb 3 07:55:54 PST 2014

On 01/27/2014 12:12 PM, Chandler Carruth wrote:
> Author: chandlerc
> Date: Mon Jan 27 05:12:24 2014
> New Revision: 200213
>
> URL: http://llvm.org/viewvc/llvm-project?rev=200213&view=rev
> Log:
> [vectorizer] Teach the loop vectorizer's unroller to only unroll by
> powers of two. This is essentially always the correct thing given the
> impact on alignment, scaling factors that can be used in addressing
> modes, etc. Also, fix the management of the unroll vs. small loop cost
> to more accurately model things with this world.
>
> Enhance a test case to actually exercise more of the unroll machinery if
> using synthetic constants rather than a specific target model. Before
> this change, with the added flags this test will unroll 3 times instead
> of either 2 or 4 (the two sensible answers).
>
> While I don't expect this to make a huge difference, if there are lots
> of loops sitting right on the edge of hitting the 'small unroll' factor,
> they might change behavior. However, I've benchmarked moving the small
> loop cost up and down in many various ways and by a huge factor (2x)
> without seeing more than 0.2% code size growth. Small adjustments such
> as the series that led up here have led to about 1% improvement on some
> benchmarks, but it is very close to the noise floor so I mostly checked
> that nothing regressed. Let me know if you see bad behavior on other
> targets but I don't expect this to be a sufficiently dramatic change to
> trigger anything.

Just for info,

this change caused the following performance changes on X86_64 (median 
of 10 runs ensures the noise is filtered out reliably).

Compile Time
==============
  				Δ 		Previous 	Current
fftbench 			2.73% 		0.4400 		0.4520
stepanov_abstraction 		1.86% 		0.6440 		0.6560
simple_types_constant_folding 	1.22% 		2.7942 		2.8282
loop_unroll 			1.16% 		3.9642 		4.0103

Execution Time
==============

			Δ 		Previous 	Current
ControlFlow-dbl 	2.08% 		 4.5283 	 4.6223
fannkuch 		2.04% 		 3.2282 	 3.2942
ControlFlow-flt 	1.67% 		 4.0723 	 4.1403
pairlocalalign 		1.24% 		25.8096 	26.1296

			Δ 		Previous 	Current
gramschmidt 		-12.92% 	2.5402 		2.2121
gcc-loops 		-7.11% 		4.8403 		4.4963
siod 			-6.23% 		3.0162 		2.8282
LinearDependence-flt 	-2.39% 		4.3603 		4.2563

http://llvm.org/perf/db_default/v4/nts/21395?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=21392&submit=Update

Cheers,
Tobias