[PATCH] D132784: [AArch64][TTI] Add cost table entry for trunc over vector of integers.

Mingming Liu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 30 00:47:47 PDT 2022


mingmingl added a comment.

In D132784#3757003 <https://reviews.llvm.org/D132784#3757003>, @mingmingl wrote:

> https://reviews.llvm.org/D132889 sweeps tests in `llvm/test/Analysis/CostModel/AArch64` by adding aarch64-le target layout and updating tests -> only `cast.ll` and `arith-overflow.ll` stand out.
>
> Given the split approach <https://github.com/llvm/llvm-project/blob/a11ec00afea327419ec1ab7c78ba6818d6c5bbf7/llvm/include/llvm/CodeGen/BasicTTIImpl.h#L1084-L1086> of BaseTTIImpl is used as a fallback when entry is not present in the table, it's likely that updating a few entries (as current patch does) to the correct number may still cause inaccuracies (in terms of how they are used in split approach and affecting the estimation of wider types).
>
> Planning to add more entries (using https://gcc.godbolt.org/z/q8qodd147 as a template to get a better idea of codegen for different `trunc`), not sure if there is a minimum set of table entries that (works well with split approach and thereby) get all (at least all in `cast.ll`) combinations of `trunc` as accurate as possible.
>
> Meanwhile would appreciate feedbacks on this plan! (assuming i'm on the right track to attribute cost to `trunc` for typical aarch64 data layout :) )

Ended up adding entries for 'trunc' over vector integers rather than relying on split approach of BaseTTI <https://github.com/llvm/llvm-project/blob/a11ec00afea327419ec1ab7c78ba6818d6c5bbf7/llvm/include/llvm/CodeGen/BasicTTIImpl.h#L1084-L1086>.
The numerical value is based on the number of {xtn, uzp1} in actual codegen (https://gist.github.com/minglotus-6/438e49494fe3d26876933141f889c2ac has godbolt links for many 'trunc'). 'arith-overflow.ll' is updated accordingly.

Besides, `trunc <4 x i64> %var to <4 x i8>` seems suboptimal -> https://gcc.godbolt.org/z/b36oEr11d gives `2  xtn + 1 uzp1` while `1 uzp1 + 1 xtn` (4 x i64 -> 4 x i32, then 4 x i32 -> 4 x i8) should be sufficient IIUC. Use '3' (from actual codegen) and added a FIXME for it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132784/new/

https://reviews.llvm.org/D132784



More information about the llvm-commits mailing list