[PATCH] D93708: [AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.

Petar Avramovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 23 05:04:34 PST 2020


Petar.Avramovic added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll:11
+; GFX6789: v_med3_i32 [[A]], [[B]], [[A]], [[C]]
+; GCN: v_cvt_pk_i16_i32_e64 [[A:v[0-9]+]], [[A]], [[B:v[0-9]+]]
+; GCN: v_mov_b32_e32 [[C:v[0-9]+]], 0x7fff
----------------
This looks incorrect since some high bits are discarded (are you missing some known bits analysis?).
Think of positive signed input value that has 1 somewhere in high 16 bits, e.g. 0x4000000000000000
'icmp select' sequence gives 32767 which is then truncated to i16 (no op in this case).
But  cvt_pk takes low 16 bits from 0x40000000 and 0x00000000 and packs them into 32 bit value 0. 
v_med3 with [[A]]=0 gives 0.
Does sdag have a similar combine?


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll:16-17
+entry:
+  %0 = icmp sgt i64 %in, -32768
+  %1 = select i1 %0, i64 %in, i64 -32768
+  %2 = icmp slt i64 %1, 32767
----------------
For awareness: generic llvm-ir  max and min intrinsics  @llvm.umax. and @llvm.umin. were added recently.
It is expected to get llvm-ir combine that will transform cmp+select into intrinsic at some point and then there would be no need to match cmp+select in combines. You should test with min and max intrinsics for now. However since this is i64 I guess it is fine to use cmp + select since intrinsic min and max have to be lowered for s64. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93708/new/

https://reviews.llvm.org/D93708



More information about the llvm-commits mailing list