[llvm] [NVPTX] Improve lowering of v4i8 (PR #67866)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 2 12:20:23 PDT 2023


================
@@ -3292,6 +3293,10 @@ let hasSideEffects = false in {
                              (ins Int16Regs:$s1, Int16Regs:$s2,
                                   Int16Regs:$s3, Int16Regs:$s4),
                              "mov.b64 \t$d, {{$s1, $s2, $s3, $s4}};", []>;
+  def V4I8toI32  : NVPTXInst<(outs Int32Regs:$d),
+                             (ins Int16Regs:$s1, Int16Regs:$s2,
+                                  Int16Regs:$s3, Int16Regs:$s4),
+                             "mov.b32 \t$d, {{$s1, $s2, $s3, $s4}};", []>;
----------------
Artem-B wrote:

Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well just do that in PTX, too.

https://github.com/llvm/llvm-project/pull/67866


More information about the llvm-commits mailing list