[PATCH] D39575: [X86] Add subtarget features prefer-avx256 and prefer-avx128 and use them to limit vector width presented by TTI

Sun Nov 5 07:01:25 PST 2017

hfinkel added a comment.

In https://reviews.llvm.org/D39575#916063, @RKSimon wrote:

> Should we be looking at this from a cost model POV? Possibly introducing the concept of a cost for "first use" of a vector type (in this case for 512-bit vectors),  increasing the cost of 512-bit ops and making the cost models more 'state aware' (use of a 512-bit vector causes a cost multiplier effect on ALL vector ops).

It's not clear to me that this can be a local decision. It's not just the frequency effect on the core in question, it's also the power draw and how that effects the speed of everything else.

> Having said that the vectorizers don't seem to do a good job of comparing costs of different vector widths - AVX1 (Jaguar/Bulldozer/SandyBridge etc.) often end up with 256-bit integer vector code despite the fact that the costs already flag the x4 cost compared to 128-bit integer equivalents (and cause nasty register spill issues).

Not to get too far off topic, but could you elaborate somewhere? Are there bug reports? If it's 4x the cost, and only 2x the width, I'm surprised that we'd get that wrong (assuming that's true for most of the instructions in the loop). I'm curious whether is a deficiency with the register-pressure estimation heuristic in the vectorizer (which matters only for interleaving, but perhaps that's part of the problem?).

https://reviews.llvm.org/D39575