[llvm] [AArch64][GlobalISel] Add custom legalization for v4s8 = G_TRUNC v4s16 (PR #85610)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 18 02:44:06 PDT 2024
davemgreen wrote:
Hi. I had been looking at v4i8 truncate again recently, and had assumed that we would moreElements them. It had some inefficiencies that was stopping me from putting up the patch though, although my attempts to fix them had only led to more problems so far.
> We see a _lot_ of fallbacks these days due to <4 x s8> types appearing in truncates, and these seem to be commonly being used by the new load/store bitcasting -> s32 rule.
>
> We can keep that load/store rule if we make sure to handle the truncates properly, and we adopt a similar strategy for this custom action as in DAG lowering's LowerTruncateVectorStore(). That is, we first widen the input <4 x s16> to <8 x s16>, so we can generate a legal G_TRUNC to <8 x s8>, and from there extract the final 32 bit sized value.
My understanding was that until we fixed v4i8 load/store recently, these would have fallen back due to the load/store? Sounds like we are moving in the right direction.
Whatever we do it should ideally handle other small types too - v2i8 and v2i16. I can put up the moreElements version if it is useful, I was hoping that the extra merge/unmerges introduced could all be removed. My worry with extra bitcast is that they get in the way of optimizations (especially under BE), but with enough combines either can probably be made to work cleanly I would hope.
https://github.com/llvm/llvm-project/pull/85610
More information about the llvm-commits
mailing list