[llvm] [NVPTX] Improve lowering of v4i8 (PR #67866)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 2 11:25:54 PDT 2023
================
@@ -3292,6 +3293,10 @@ let hasSideEffects = false in {
(ins Int16Regs:$s1, Int16Regs:$s2,
Int16Regs:$s3, Int16Regs:$s4),
"mov.b64 \t$d, {{$s1, $s2, $s3, $s4}};", []>;
+ def V4I8toI32 : NVPTXInst<(outs Int32Regs:$d),
+ (ins Int16Regs:$s1, Int16Regs:$s2,
+ Int16Regs:$s3, Int16Regs:$s4),
+ "mov.b32 \t$d, {{$s1, $s2, $s3, $s4}};", []>;
----------------
Artem-B wrote:
:-(
PTX inconsistencies are annoying. We can do a 4-element `.b64` move, but not `.b32`. Go figure.
https://github.com/llvm/llvm-project/pull/67866
More information about the llvm-commits
mailing list