[PATCH] D73482: [AMDGPU] Fix lowering a16 image intrinsics

Wed Jan 29 01:27:00 PST 2020

nhaehnle added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5446-5448
+        SDValue Vec = DAG.getNode(ISD::UNDEF, DL, VectorVT);
+        Vec = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VectorVT, Vec, AddrLo,
+                          DAG.getConstant(0, DL, MVT::i32));
----------------
rtaylor wrote:
> Couldn't this just be a SCALAR_TO_VECTOR to get the 0th element and create the vector? I think Nicolai mentioned this before?
Yes, please change this.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll:3
+; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX9 %s
+
+define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %smpl, float %sf, float %tf) {
----------------
rtaylor wrote:
> sebastian-ne wrote:
> > arsenm wrote:
> > > if this is an encoding test, it should use -show-mc-encoding?
> > Maybe the name was chosen uncarefully, this test should ensure that all components are actually converted and packed into one register.
> > 
> > Before this patch, a16 is not working because the y component just gets ignored, the compiler only ensures that the x component is present.
> > Still, all other a16 tests are passing. So I tried to introduce this test as a regression test.
> > It should make sure that actually all components are "encoded" into the packed coordinates.
> > 
> > I guess renaming it to llvm.amdgcn.image.a16.pack.ll would make more sense?
> Should we be checking the encoding for all existing tests instead?
Yes, rename the file to avoid confusion.

Yes, we should add encoding tests, but that should happen in a separate patch that actually fixes the encoding for gfx1010 :)

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll:5
 ; GCN: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16{{$}}
-define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
+define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, <2 x half> %args) {
 main_body:
----------------
rtaylor wrote:
> What is the point of combining the halves into vectors? I think this makes the test code unnecessarily complex and less readable.
Ideally, this test should be changed into one that uses the update_llc_test_checks script; with that, having the inputs packed like this will highlight some inefficiencies in the codegen that we should fix.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73482/new/

https://reviews.llvm.org/D73482