[llvm] r331633 - [AMDGPU] Don't force WQM for DS op

Mon May 7 06:21:26 PDT 2018

Author: tpr
Date: Mon May  7 06:21:26 2018
New Revision: 331633

URL: http://llvm.org/viewvc/llvm-project?rev=331633&view=rev
Log:
[AMDGPU] Don't force WQM for DS op

Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.

With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.

The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46051

Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81

Modified:
    llvm/trunk/lib/Target/AMDGPU/SIWholeQuadMode.cpp
    llvm/trunk/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll
    llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll

Modified: llvm/trunk/lib/Target/AMDGPU/SIWholeQuadMode.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIWholeQuadMode.cpp?rev=331633&r1=331632&r2=331633&view=diff
==============================================================================

--- llvm/trunk/lib/Target/AMDGPU/SIWholeQuadMode.cpp (original)
+++ llvm/trunk/lib/Target/AMDGPU/SIWholeQuadMode.cpp Mon May  7 06:21:26 2018
@@ -325,9 +325,7 @@ char SIWholeQuadMode::scanInstructions(M
       unsigned Opcode = MI.getOpcode();
       char Flags = 0;
 
-      if (TII->isDS(Opcode) && CallingConv == CallingConv::AMDGPU_PS) {
-        Flags = StateWQM;
-      } else if (TII->isWQM(Opcode)) {
+      if (TII->isWQM(Opcode)) {
         // Sampling instructions don't need to produce results for all pixels
         // in a quad, they just require all inputs of a quad to have been
         // computed for derivatives.

Modified: llvm/trunk/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll?rev=331633&r1=331632&r2=331633&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll (original)
+++ llvm/trunk/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll Mon May  7 06:21:26 2018
@@ -355,7 +355,7 @@ exit1:
 
 ; GCN: v_mov_b32_e32 v0, 2.0
 ; GCN: s_or_b64 exec, exec
-; GCN: s_and_b64 exec, exec
+; GCN-NOT: s_and_b64 exec, exec
 ; GCN: v_mov_b32_e32 v0, 1.0
 
 ; GCN: {{^BB[0-9]+_[0-9]+}}: ; %UnifiedReturnBlock

Modified: llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll?rev=331633&r1=331632&r2=331633&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll (original)
+++ llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll Mon May  7 06:21:26 2018
@@ -95,7 +95,8 @@ main_body:
 
 if:                                               ; preds = %main_body
   %lds_ptr = getelementptr [64 x float], [64 x float] addrspace(3)* @lds, i32 0, i32 0
-  %lds_data = load float, float addrspace(3)* %lds_ptr
+  %lds_data_ = load float, float addrspace(3)* %lds_ptr
+  %lds_data = call float @llvm.amdgcn.wqm.f32(float %lds_data_)
   br label %endif
 
 else:                                             ; preds = %main_body
@@ -208,6 +209,7 @@ declare float @llvm.amdgcn.interp.mov(i3
 declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
 declare void @llvm.amdgcn.exp.compr.v2f16(i32, i32, <2 x half>, <2 x half>, i1, i1) #0
 declare <2 x half> @llvm.amdgcn.cvt.pkrtz(float, float) #1
+declare float @llvm.amdgcn.wqm.f32(float) #1
 
 attributes #0 = { nounwind }
 attributes #1 = { nounwind readnone }