[PATCH] D128302: [AArch64][CostModel] Detects that ExtractElement at index is not free in AArch64 when result is used as integer.
Mingming Liu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 6 21:59:54 PDT 2022
mingmingl added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1868
+ Instruction *UserI = dyn_cast<Instruction>(U.getUser());
+ // FIXME: Shall all users be of type Instruction?
+ if (!UserI)
----------------
mingmingl wrote:
> mingmingl wrote:
> > mingmingl wrote:
> > > mingmingl wrote:
> > > > RKSimon wrote:
> > > > > mingmingl wrote:
> > > > > > nikic wrote:
> > > > > > > Yes, all users of an instruction are instructions, it's safe to use `cast` here.
> > > > > > thanks for taking a look! I looked at a few existing cpp and they also assume so.
> > > > > >
> > > > > > Changing this to an assert then.
> > > > > Scanning the uses like this isn't going to be cheap - and getCost calls can be called a LOT.
> > > > >
> > > > > Why did you decide this was necessary? Do existing tests change without it? You don't appear to have added any test coverage for these cases.
> > > > Scanning cost is a good point; one mitigation (in the patch) is that, there is a `break` out of the loop once an integer use case if ound.
> > > >
> > > > May I measure the compile time increase with https://github.com/nikic/llvm-compile-time-tracker and report back? (leaving this open)
> > > Compared between HEAD~1 and this patch gives https://llvm-compile-time-tracker.com/compare.php?from=1e556f459b44dd0ca4073e932f66ecb6f40fe31a&to=63035f714c94d31698b5b4ab79b9912814c29345&stat=instructions
> > >
> > > Across different configs and benchmarks, the compile-time increases or decreases within +/- 0.1%. I wonder if this means it's fine in terms of the compile-time cost?
> > Actually I wasn't sure if AArch64 code path is executed at this point (without reading how tracker is implemented) even if the metrics (e.g., instructions) could be collected without running the code. Do you have other suggestions to measure the cost of iterating uses?
> >
> > > Why did you decide this was necessary?
> > In order to know if there is a scalar integer use (that results in additional operations to move / dup vector subregisters to general purpose registers), all uses of the extract-element instruction are enumerated here.
> >
> >
> > > Do existing tests change without it?
> > The reason to handle `store` is based on this example (https://godbolt.org/z/fn4jWr5eW).
> >
> > > You don't appear to have added any test coverage for these cases.
> > I could add https://godbolt.org/z/fn4jWr5eW as a test case if it sounds good.
> > Actually I wasn't sure if AArch64 code path is executed at this point (without reading how tracker is implemented) even if the metrics (e.g., instructions) could be collected without running the code. Do you have other suggestions to measure the cost of iterating uses?
> >
>
> Regarding compile-time change
> 1. The llvm-compile-time-tracker [[ https://github.com/nikic/llvm-compile-time-tracker/blob/29f77a0e5137e7fc70d480bd867cec4e253b6196/cmake_llvm_project.sh#L3 | measures ]] x86 code paths.
> 2. I did a similar comparison (by scraping scripts) for `-DLLVM_TARGETS_TO_BUILD="AArch64" `, and build `llvm-test-suite` [[ https://github.com/llvm/llvm-test-suite/blob/main/cmake/caches/O3.cmake | with O3 ]]
> 1. The diff of instruction counts are within +/- 0.1%, and the diff of cycles changes are within 2% (CTMark results uploaded {F23701903})
>
> Regarding the pattern of iterating `uses()` and its cycle change
> - I think it's not un-common for a pass to iterate `uses()` of instructions, and here only a subset of `extract-element` instructions uses are iterated.
> - On the other hand, "used-as-integer" fact might be pre-computed and maintained inside class `ExtractElementInst`, so AArch64TTI can query / re-use the result, if we do want to avoid calculating it repeatedly.
>
> Would like to hear some thoughts for the topics above, thanks!
Between iterating all uses and not looking at uses, another option for now is to look at `use_empty()`, since the instruction cost would ultimately be 0 (regardless of index value) if `use_empty()` returns true.
Together with this option, a FIXME might be added to point out if all uses are store operations that could be performed bitwise (i.e., without cross-register-bank copies), the extract-element operation is free.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128302/new/
https://reviews.llvm.org/D128302
More information about the llvm-commits
mailing list