[all-commits] [llvm/llvm-project] a4dd51: [mlir][ArithToAMDGPU] Use native packing support (...

Krzysztof Drewniak via All-commits all-commits at lists.llvm.org
Thu Jul 24 10:26:24 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: a4dd51d72f18df5ebc447e3c9070bc392fddb9b5
      https://github.com/llvm/llvm-project/commit/a4dd51d72f18df5ebc447e3c9070bc392fddb9b5
  Author: Krzysztof Drewniak <Krzysztof.Drewniak at amd.com>
  Date:   2025-07-24 (Thu, 24 Jul 2025)

  Changed paths:
    M mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
    M mlir/test/Conversion/ArithToAMDGPU/scaling-extf.mlir
    M mlir/test/Conversion/ArithToAMDGPU/scaling-truncf.mlir

  Log Message:
  -----------
  [mlir][ArithToAMDGPU] Use native packing support (#150342)

The current arith-to-amdgpu patterns for scaling_extf and scaling_truncf
don't take full advantage of the native packing ability of the
intrinsics being targetted. Scaling extension takes the location of the
two elements to be extended as a constant argument (byte for fp4, half
for fp8), and scaling truncation takes a 32-bit input register and a
byte or half to write the truncated values to.

Not using these features would cause excess unneeded register pressure.
This PR resolves the inefficiency.

It also adds a test for the expected usecase of extending or
truncateting a block of 32 values to/from fp4 with a uniform scale to
ensure that this usage has a minimal amount of vector shuffling.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list