[PATCH] D73482: [AMDGPU] Fix lowering a16 image intrinsics

Wed Jan 29 06:08:28 PST 2020

rtaylor added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll:3
+; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX9 %s
+
+define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %smpl, float %sf, float %tf) {
----------------
nhaehnle wrote:
> rtaylor wrote:
> > sebastian-ne wrote:
> > > arsenm wrote:
> > > > if this is an encoding test, it should use -show-mc-encoding?
> > > Maybe the name was chosen uncarefully, this test should ensure that all components are actually converted and packed into one register.
> > > 
> > > Before this patch, a16 is not working because the y component just gets ignored, the compiler only ensures that the x component is present.
> > > Still, all other a16 tests are passing. So I tried to introduce this test as a regression test.
> > > It should make sure that actually all components are "encoded" into the packed coordinates.
> > > 
> > > I guess renaming it to llvm.amdgcn.image.a16.pack.ll would make more sense?
> > Should we be checking the encoding for all existing tests instead?
> Yes, rename the file to avoid confusion.
> 
> Yes, we should add encoding tests, but that should happen in a separate patch that actually fixes the encoding for gfx1010 :)
I'm not sure what this file is doing that the current tests don't do?

; GCN-LABEL: {{^}}sample_2d:
; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16{{$}}
define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
main_body:
  %v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
  ret <4 x float> %v
}

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll:5-16
+; GFX9-LABEL: load_3d:
+; GFX9:       ; %bb.0: ; %main_body
+; GFX9-NEXT:    s_mov_b64 s[12:13], exec
+; GFX9-NEXT:    s_wqm_b64 exec, exec
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v1, v1
+; GFX9-NEXT:    v_and_b32_e32 v0, 0xffff, v0
----------------
Do you need all this checking or can you just check the image_sample instruction? If so, it would be cleaner and easier to read and easier to tell what you are checking  with just the GFX9: image_sample... 

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll:5
 ; GCN: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16{{$}}
-define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
+define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, <2 x half> %args) {
 main_body:
----------------
nhaehnle wrote:
> rtaylor wrote:
> > What is the point of combining the halves into vectors? I think this makes the test code unnecessarily complex and less readable.
> Ideally, this test should be changed into one that uses the update_llc_test_checks script; with that, having the inputs packed like this will highlight some inefficiencies in the codegen that we should fix.
Nicolai, can you be more specific, what inefficiencies?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73482/new/

https://reviews.llvm.org/D73482