[llvm] [AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (PR #88512)

Leon Clark via llvm-commits llvm-commits at lists.llvm.org
Thu May 2 14:02:46 PDT 2024


================
@@ -1270,13 +1270,22 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
     .custom();
 
   // The 64-bit versions produce 32-bit results, but only on the SALU.
-  getActionDefinitionsBuilder({G_CTLZ_ZERO_UNDEF, G_CTTZ_ZERO_UNDEF})
-    .legalFor({{S32, S32}, {S32, S64}})
-    .clampScalar(0, S32, S32)
-    .clampScalar(1, S32, S64)
-    .scalarize(0)
-    .widenScalarToNextPow2(0, 32)
-    .widenScalarToNextPow2(1, 32);
+  getActionDefinitionsBuilder(G_CTLZ_ZERO_UNDEF)
+      .legalFor({{S32, S32}, {S32, S64}})
+      .customFor({{S32, S8}, {S32, S16}})
----------------
PeddleSpam wrote:

I think that would give us the wrong result. The current lowering subtracts the number of extended bits from the result of CTLZ, so the number of leading zeroes in the original value needs to be maintained.

https://github.com/llvm/llvm-project/pull/88512


More information about the llvm-commits mailing list