[PATCH] D111754: Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON'

Wed Oct 13 12:21:33 PDT 2021

Ravi created this revision.
Ravi added reviewers: nhaehnle, arsenm, cdevadas, critson, rampitec, dstuttard.
Herald added subscribers: foad, wenlei, kerbowa, hiraditya, jvesely.
Ravi requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. The fix could have been done in one of 2 approaches.

Approaches:

1. First approach is to add 'bias' operand index for each MIMG instruction in the table-gen generated image intrinsic info to later identify the 'bias' operand with it and fix it as 2 packed 16-bit operands with the upper 16-bit being undefined.
2. The 'bias' operand is the only operand in the MIMG intrinsics that would be a 16-bit incoming operand below the index for gradients. The other image address operands 'offset' and 'z-compare' that come before gradients are always 32-bit irrespective of the A16-bit. The patch implements this logic.

Testing: 
Multiple tests with A16 ON and OFF in both GlobalISel and ISelDAG path are checked and updated. Especially the SAMPLE and GATHER4 instruction tests. Few tests with 
SAMPLE were missing in both the paths with A16 'OFF'. But the GATHER4 tests have covered this case. So no additional tests are added.
All the lit tests have passed.

Observations:

1. The ISelDAG path generates the same code as earlier without any inefficiency. But the GlobalISel path adds explicit instructions to fill the upper 16-bit with junk. This has to be probably analysed as a new optimization issue and identify the path that's introducing them.
2. Occuring with earlier code as well as with theis patch. The image resource constant(a group of 8 registers)  are being copied from set of contiguous registers to another set of contiguous registers to take care of alignment. These copies can be avoided with a custom lowering of formal arguments. And the specific register info could be reported back to the driver/encoded in the code objects.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D111754

Files:
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.sample.a16.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.a16.dim.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111754.379480.patch
Type: text/x-patch
Size: 114587 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211013/5ba7ecf4/attachment-0001.bin>