[PATCH] D20694: AMDGPU/SI: Enable load-store-opt by default.

Thu May 26 11:23:57 PDT 2016

arsenm added inline comments.

================
Comment at: test/CodeGen/AMDGPU/fmin3.ll:15-17
@@ -14,5 +14,5 @@
 define void @test_fmin3_olt_0(float addrspace(1)* %out, float addrspace(1)* %aptr, float addrspace(1)* %bptr, float addrspace(1)* %cptr) nounwind {
-  %a = load float, float addrspace(1)* %aptr, align 4
-  %b = load float, float addrspace(1)* %bptr, align 4
-  %c = load float, float addrspace(1)* %cptr, align 4
+  %a = load volatile float, float addrspace(1)* %aptr, align 4
+  %b = load volatile float, float addrspace(1)* %bptr, align 4
+  %c = load volatile float, float addrspace(1)* %cptr, align 4
   %f0 = call float @llvm.minnum.f32(float %a, float %b) nounwind readnone
----------------
cfang wrote:
> arsenm wrote:
> > Why did this test change as it doesn't use local memory?
> if (getOptLevel() > CodeGenOpt::None && ST.loadStoreOptEnabled()) {
>     // Don't do this with no optimizations since it throws away debug info by
>     // merging nonadjacent loads.
> 
>     // This should be run after scheduling, but before register allocation. It
>     // also need extra copies to the address operand to be eliminated.
>     insertPass(&MachineSchedulerID, &SILoadStoreOptimizerID);
>     insertPass(&MachineSchedulerID, &RegisterCoalescerID);
>   }
> 
> I think the RegisterCoalescer Pass makes a difference. An ideal approach is to add the coalescer pass only when load-store-opt actually happens, but I think it is no harm here to have this additional coalescer pass.
> 
> 
Oh, OK. That extra run is a workaround anyway. We should fix the pass to not depend on the scheduler and run in SSA form


http://reviews.llvm.org/D20694