[PATCH] D93708: [AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.

Thomas Symalla via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 4 02:21:49 PST 2021


tsymalla marked 2 inline comments as done.
tsymalla added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:1-2
-//=== lib/CodeGen/GlobalISel/AMDGPUPostLegalizerCombiner.cpp ---------------===//
+//=== lib/CodeGen/GlobalISel/AMDGPUPostLegalizerCombiner.cpp
+//---------------===//
 //
----------------
Flakebi wrote:
> Unintended change?
I think this happened because of using clang-format.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll:11
+; GFX6789: v_med3_i32 [[A]], [[B]], [[A]], [[C]]
+; GCN: v_cvt_pk_i16_i32_e64 [[A:v[0-9]+]], [[A]], [[B:v[0-9]+]]
+; GCN: v_mov_b32_e32 [[C:v[0-9]+]], 0x7fff
----------------
nhaehnle wrote:
> Petar.Avramovic wrote:
> > This looks incorrect since some high bits are discarded (are you missing some known bits analysis?).
> > Think of positive signed input value that has 1 somewhere in high 16 bits, e.g. 0x4000000000000000
> > 'icmp select' sequence gives 32767 which is then truncated to i16 (no op in this case).
> > But  cvt_pk takes low 16 bits from 0x40000000 and 0x00000000 and packs them into 32 bit value 0. 
> > v_med3 with [[A]]=0 gives 0.
> > Does sdag have a similar combine?
> cvt_pk performs a clamp before packing into 16 bits. It's not very well documented.
The whole concept behind this, as Nicolai said, is, to have the two 32 bit-parts of the 64-bit number being clamped automatically against short boundaries by using v_cvt_pk_i16_i32. So the lower and higher 16 bits of the resulting 32 bit number should be clamped independently. If this fails for some case (very large numbers), the v_med3 instruction should make sure that the value actually gets clamped.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll:18-21
+  %2 = icmp slt i64 %1, 32767
+  %3 = select i1 %2, i64 %1, i64 32767
+  %4 = trunc i64 %3 to i16
+
----------------
arsenm wrote:
> I dislike anonymous values in tests, can you add names here
Hi, what do you mean by anonymous values in this context?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93708/new/

https://reviews.llvm.org/D93708



More information about the llvm-commits mailing list