[PATCH] D39912: AMDGPU/SI: Implement d16 support for image intrinsics

Thu Nov 16 11:03:36 PST 2017

cfang added inline comments.

================
Comment at: lib/Target/AMDGPU/MIMGInstructions.td:39
        dmask:$dmask, unorm:$unorm, GLC:$glc, slc:$slc,
-       r128:$r128, tfe:$tfe, lwe:$lwe, da:$da),
-  asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da",
+       r128:$r128, tfe:$tfe, lwe:$lwe, da:$da, d16:$d16),
+  asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da$d16",
----------------
arsenm wrote:
> Since the register class is changing, there shouldn't be an operand for this. This is a separate MI opcode
For each register class, we have already had a MI opcode corresponds to it. For D16 support,
we are NOT introducing new register class that needs new MI Opcode. In stead we are using the existing ones.
Also, we could not use register class to identify whether d16 bit is set. This should come from the data types from
the intrinsics, and thus we need an operand bit.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:6903
+
+  if (Subtarget->getGeneration() >= SISubtarget::VOLCANIC_ISLANDS) {
+    // Don't adjust if d16 bit is set.
----------------
arsenm wrote:
> This should check something more specific than the generation
I don't know what is appropriate. It should be that the image instruction has d16 bit. We haven't defined this feature yet.
since it starts from VI, so I just use the generation here. Do you think I can use something like target has f16? Or define a new feature? Thanks.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:7131-7133
+    int D16Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::d16);
+    if (D16Idx > 0 && MI.getOperand(D16Idx).getImm() != 0)
+      return;
----------------
arsenm wrote:
> There shouldn't be a d16 operand, so this should be unnecessary
I think using a d16 operand might be the simpler approach to handle d16 support.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.ll:10-11
+; UNPACKED: flat_store_short v[4:5], v0
+; GFX81: flat_store_short v[4:5], v0
+; GFX9: global_store_short v[4:5], v0, off
+define amdgpu_kernel void @image_sample_f16(<4 x float> %coords, <8 x i32> inreg %rsrc, <4 x i32> inreg %sample, half addrspace(1)* %out) {
----------------
arsenm wrote:
> Checking exact registers is probably likely to break here
I know that. But I need way to differentiate v[0:1] with v[0:3], i.e. 2 or 4 registers.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.ll:78
+; GFX9: global_store_short v[4:5], v0, off
+define amdgpu_kernel void @image_sample_c_v4f16(<4 x float> %coords, <8 x i32> inreg %rsrc, <4 x i32> inreg %sample, half addrspace(1)* %out) {
+main_body:
----------------
arsenm wrote:
> inreg doesn't do anything with amdgpu_kernel, so you should remove them. 
will take a look and update accordingly. Thanks.

https://reviews.llvm.org/D39912