[PATCH] D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH

Sun Sep 20 21:14:08 PDT 2020

pdhaliwal added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:597
+    Mulh.clampMaxNumElements(0, S8, 2)
+      .fewerElementsIf(elementTypeIsNot(0, S8), scalarize(0));
+  } else {
----------------
arsenm wrote:
> pdhaliwal wrote:
> > arsenm wrote:
> > > This should be unnecessary
> > If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization.
> This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used
Hmm, unconditional scalarize would remove the possibility of using  vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85653/new/

https://reviews.llvm.org/D85653