[PATCH] D85653: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH

Tue Sep 22 06:19:01 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:597
+    Mulh.clampMaxNumElements(0, S8, 2)
+      .fewerElementsIf(elementTypeIsNot(0, S8), scalarize(0));
+  } else {
----------------
pdhaliwal wrote:
> arsenm wrote:
> > pdhaliwal wrote:
> > > arsenm wrote:
> > > > This should be unnecessary
> > > If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization.
> > This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used
> Hmm, unconditional scalarize would remove the possibility of using  vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void.
You already handled this case with the first fewerElementsIf, the second one just handles everything else. It doesn't need to specify not -s8

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85653/new/

https://reviews.llvm.org/D85653