[PATCH] D52944: AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap

Mon Oct 8 07:55:26 PDT 2018

nhaehnle added inline comments.

================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:397-399
+  // M0 = {hi16:address, lo16:waveID}. Allow passing M0 as a pointer, so that
+  // the bit packing can be optimized at the IR level.
+  [LLVMQualPointerType<llvm_i32_ty, 2>, // IntToPtr(M0)
----------------
I don't like having this as a pointer. It really isn't, so we're really kidding ourselves here.

This could just be an i32. Alternatively, and honestly preferably, this should be just the pointer to the honest-to-goodness address, and the waveID a separate operand. It should be possible to add the DAG nodes to build the M0 value in LowerINTRINSIC_W_CHAIN. Unless the DAG combine is unable to optimize that? (Or possibly even then, and we should fix the DAG combines...)

================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:401-403
+   llvm_i32_ty, // ordering
+   llvm_i32_ty, // scope
+   llvm_i1_ty,  // isVolatile
----------------
The ordering / scope / isVolatile is a bit weird, but it's weird in the same way as the existing intrinsics, so fine by me.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:5175-5176
+
+    if (WaveDone && !WaveRelease)
+      llvm_unreachable("ds_ordered_count: wave_done requires wave_release");
+
----------------
Use `report_fatal_error` (to support the non-Mesa compiler case).

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:5193
+    default:
+      llvm_unreachable("ds_ordered_count unsupported for this calling conv");
+    }
----------------
Same here.

Repository:
  rL LLVM

https://reviews.llvm.org/D52944