[PATCH] D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
Bing Yu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 20 19:51:14 PDT 2020
yubing added inline comments.
================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3088
+ // inserti128.
+ // Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.
+ unsigned Num128Lanes = LT.second.getSizeInBits() / 128 * LT.first;
----------------
pengfei wrote:
> Why inserting 5th in case 2 needs extracti128 but case 3 doesn't?
in case#3, we don't need to retain other elements since all the indices are inserted, Thus, extracti128 is not needed.
================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3089
+ // Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.
+ unsigned Num128Lanes = LT.second.getSizeInBits() / 128 * LT.first;
+ unsigned NumElts = LT.second.getVectorNumElements() * LT.first;
----------------
pengfei wrote:
> Do we need to skip if LT.second.getSizeInBits() == 128?
we've checked (LT.second.getSizeInBits() <= 128) in line3073
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89767/new/
https://reviews.llvm.org/D89767
More information about the llvm-commits
mailing list