[PATCH] D93963: [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO

Fri Jan 15 02:15:24 PST 2021

foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir:60
+    ; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH1]](s32), [[C]]
+    ; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
+    ; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
----------------
Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is nicer.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir:115-118
+    ; GFX8: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[MUL]], [[MUL1]]
+    ; GFX8: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO1]](s1)
+    ; GFX8: [[UADDO2:%[0-9]+]]:_(s32), [[UADDO3:%[0-9]+]]:_(s1) = G_UADDO [[UADDO]], [[UMULH]]
+    ; GFX8: [[ZEXT1:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO3]](s1)
----------------
Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using UADDE?

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir:218
+    ; GFX8: [[C2:%[0-9]+]]:_(s16) = G_CONSTANT i16 0
+    ; GFX8: [[MUL1:%[0-9]+]]:_(s16) = G_MUL [[TRUNC]], [[TRUNC1]]
+    ; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[TRUNC2]](s16), [[C2]]
----------------
This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32 multiply -- you can extract all the information you need from the result of that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93963/new/

https://reviews.llvm.org/D93963