[llvm-bugs] [Bug 35687] New: [x86, loop vectorizer] Smaller VF preferred when VFs have the same cost
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Dec 18 10:44:45 PST 2017
https://bugs.llvm.org/show_bug.cgi?id=35687
Bug ID: 35687
Summary: [x86, loop vectorizer] Smaller VF preferred when VFs
have the same cost
Product: new-bugs
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: dneilson at azul.com
CC: llvm-bugs at lists.llvm.org
Created attachment 19572
--> https://bugs.llvm.org/attachment.cgi?id=19572&action=edit
IR to demonstrate
The attached IR was distilled down from one of our internal tests that degraded
~50% with the landing of https://reviews.llvm.org/rL317576 (Fix default cost
model for cast op in X86). That change had the effect of calculating the cost
of a bitcast fed by a load as 0 (due to CodeGen/BasicTTIImpl.h lines 561-568 --
"If this is a zext/sext of a load, return 0 if the corresponding extending load
exists on target"). The result is that the vectorized loops in this IR end up
being 8-elements wide instead of 16; resulting in about half the throughput.
The obvious fix -- of changing the vectorizer to choose the larger VF when
costs are the same -- does fix our issue, but fails two tests:
Transforms/LoopVectorize/X86/avx1.ll
Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll
I'm filing this bug so that someone more knowledgable about loop vectorization
on x86 can chime in with a suggested way-forward.
For avx1.ll, the loop in @read_mod_i64 has the same cost for VFs 2 and 4; so,
the change would have the VF as 4 instead of 2. The test would seem to indicate
that this is undesirable with slow-unaligned-mem-32.
For fp64_to_uint32-cost-model, again the loop has the same cost at VFs 1, 2,
and 4. However, the test indicates a preference for a scalarized loop in this
case.
I don't know the nuances of x86 vectorization heuristics well enough to know
whether these two failing tests are invariants that should be addressed by the
cost model. It does seem sensible to me to desire the widest possible vector,
so perhaps there are deficiencies in the cost model that would have to be
addressed?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20171218/22649e53/attachment-0001.html>
More information about the llvm-bugs
mailing list