[PATCH] D59506: [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()

Mon Mar 25 07:50:46 PDT 2019

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/smed3.ll:48
+; GCN: v_mov_b32_e32 v{{[0-9]+}}, 12
 define amdgpu_kernel void @v_test_smed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
   %tid = call i32 @llvm.amdgcn.workitem.id.x()
----------------
lebedev.ri wrote:
> nikic wrote:
> > lebedev.ri wrote:
> > > nikic wrote:
> > > > lebedev.ri wrote:
> > > > > nikic wrote:
> > > > > > arsenm wrote:
> > > > > > > The test should be fixed so this is still testing what it intends to
> > > > > > From what I understand, the test is here to ensure that a "clamp" pattern with swapped constants is not incorrectly compiled to a `med3` instruction. After this change this kind of pattern is always constant folded by InstSimplify. From the debug log, in this case the simplification is caused by an EarlyCSE pass added by the AMDGPU target. I haven't found a way to disable it.
> > > > > > 
> > > > > > Do you have any ideas/pointers on how this pattern can be preserved all the way down to isel?
> > > > > The *test* needs to be modified, supposedly.
> > > > > I.e. just write another test that would still produce the "original output".
> > > > > 
> > > > Yes, I'm just not sure how to do that. What we need is to get something like
> > > > 
> > > >         t36: i32 = umax t44, Constant:i32<17>
> > > >       t40: i32 = umin t36, Constant:i32<12>
> > > > 
> > > > during isel without anything optimizing it away before that. For the unsigned case I can use something like
> > > > 
> > > > ```
> > > > define amdgpu_kernel void @v_test_umed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
> > > >   %tid = call i32 @llvm.amdgcn.workitem.id.x()
> > > >   %gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
> > > >   %outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
> > > >   %a = load i32, i32 addrspace(1)* %gep0
> > > > 
> > > >   %icmp0 = icmp ugt i32 %a, 17
> > > >   %i0 = select i1 %icmp0, i32 %a, i32 17
> > > > 
> > > >   %sat = call i32 @llvm.uadd.sat(i32 %i0, i32 -13)
> > > > 
> > > >   store i32 %sat, i32 addrspace(1)* %outgep
> > > >   ret void
> > > > }
> > > > declare i32 @llvm.uadd.sat(i32, i32)
> > > > ```
> > > > 
> > > > because uaddsat x, -13 will be expanded using a umax x, 12 as first instruction on AMDGPU. I don't have any ideas for the signed case though. And relying on expansion details seems like a bad idea anyway.
> > > Can this be replaced with MIR test perhaps? Then you shouldn't run the whole pipeline.
> > Not really familiar with MIR, but I think it is only created after isel, so it would be too late in the pipeline for this test.
> Hmm, i'm out of the ideas presently then.
> @arsenm any suggestions as to *how* to harden the test?
I've used intrinsics that aren't constant folded by instsimplify before for this purpose. llvm.amdgcn.groupstaticsize is expanded too late for this, and I'm not sure what other good candidates are there.  Overall I find the fact that llc runs InstSimplify problematic for this reason

Maybe you could try adding an extra use of something that's dead which will be deleted during lowering? llvm.amdgcn.icmp with an out of bounds last operand might work

```
So something like
%use = call i64 @llvm.amdgcn.icmp.i64(i32 %select, i32 %0, i32 99999)
store i64 %use, ....
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59506/new/

https://reviews.llvm.org/D59506