[all-commits] [llvm/llvm-project] 8e37b5: [X86] Rewrite `getScalarizationOverhead()`
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Tue Nov 15 10:07:56 PST 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 8e37b53360484f07a24ba07cf61b7553ffcc42ae
https://github.com/llvm/llvm-project/commit/8e37b53360484f07a24ba07cf61b7553ffcc42ae
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2022-11-15 (Tue, 15 Nov 2022)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/arith-fp-codesize.ll
M llvm/test/Analysis/CostModel/X86/arith-fp-latency.ll
M llvm/test/Analysis/CostModel/X86/arith-fp-sizelatency.ll
M llvm/test/Analysis/CostModel/X86/arith-fp.ll
M llvm/test/Analysis/CostModel/X86/bitreverse-codesize.ll
M llvm/test/Analysis/CostModel/X86/bitreverse-latency.ll
M llvm/test/Analysis/CostModel/X86/bitreverse-sizelatency.ll
M llvm/test/Analysis/CostModel/X86/fmaxnum-size-latency.ll
M llvm/test/Analysis/CostModel/X86/fminnum-size-latency.ll
M llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
M llvm/test/Analysis/CostModel/X86/fptosi.ll
M llvm/test/Analysis/CostModel/X86/fptoui.ll
M llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll
M llvm/test/Analysis/CostModel/X86/gather-i32-with-i8-index.ll
M llvm/test/Analysis/CostModel/X86/gather-i64-with-i8-index.ll
M llvm/test/Analysis/CostModel/X86/gather-i8-with-i8-index.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-8.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-5.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-7.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-8.ll
M llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll
M llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll
M llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
M llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
M llvm/test/Analysis/CostModel/X86/powi.ll
M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1.ll
M llvm/test/Analysis/CostModel/X86/shuffle-replication-i16.ll
M llvm/test/Analysis/CostModel/X86/shuffle-replication-i32.ll
M llvm/test/Analysis/CostModel/X86/shuffle-replication-i64.ll
M llvm/test/Analysis/CostModel/X86/shuffle-replication-i8.ll
M llvm/test/Analysis/CostModel/X86/sitofp.ll
M llvm/test/Analysis/CostModel/X86/trunc.ll
M llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll
M llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll
Log Message:
-----------
[X86] Rewrite `getScalarizationOverhead()`
All of our insert/extract ops work on 128-bit lanes.
For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.
But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.
Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.
`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
// Note that in general, the insertion starting at the beginning of a vector
// isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.
I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.
The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D137913
More information about the All-commits
mailing list