[PATCH] D128839: [DirectX backend] Add createHandle BufferLoad/Store DXIL operation

Mon Jul 4 05:26:07 PDT 2022

nhaehnle added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsDXIL.td:22
+
+def int_dxil_buffer_load : Intrinsic<[ llvm_any_ty, LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty ],
+ [ llvm_i64_ty, llvm_i32_ty, llvm_i32_ty], [IntrReadMem, IntrWillReturn]>;
----------------
beanz wrote:
> python3kgae wrote:
> > beanz wrote:
> > > python3kgae wrote:
> > > > beanz wrote:
> > > > > Do you have a plan for taking LLVM load instructions and converting them to these intrinsics?
> > > > > 
> > > > > I think we need to think about how we want to translate LLVM gep/load/store instructions into DXIL ops, and I don't think we should add these intrinsics until we know what that is going to look like.
> > > > These intrinsics are trying to make the distance from hlsl to DXIL shorter.
> > > > They're just wrapper for DXIL operation functions so generate DXIL is easier.
> > > > 
> > > >  I did experiment to generate DXIL directly from GEP/load/store, then found create intrinsic might help the translation.
> > > I still don't know that these are the _right_ intrinsics. How are we going to map GEP/load/store to these intrinsics?
> > It will not be a simple map, we'll need a pass to translate GEP/load/store to these intrinsics.
> > These intrinsics are to make the pass easier to write and leave the details like DXIL opcode, DXIL struct type to DXILOpLowering pass.
> > 
> > Maybe we can allow GEP/load/store in final DXIL for future DXIL version, but to generate early version of DXIL, these intrinsics will be 
> >  helpful.
> I didn't mean to imply it would be a simple map (as in map data structure), but it is a mapping operation. GEPs get folded in with loads and stores to form load and store DXIL Ops.
> 
> Clang will generate GEPs, loads, and stores through known handle pointers. Unlike in DXC we won't map those to "high level" intrinsics during codegen, instead we'll emit the GEPs, loads and stores. That will allow LLVM's optimization passes (like SROA) to run without needing to be taught about all of the special intrinsics for HLSL.
> 
> If the input to our backend is expected to be GEPs, loads and stores, I fail to see why we would translate those to an intrinsic that has an identical signature to the DXIL Op (minus the opcode) instead of just translating it to the DXIL Op directly.
FWIW, we have a similar issue in LLPC, our SPIR-V-to-AMDGPU-backend shader compiler. The backend has a family of `llvm.amdgcn.buffer.load/store` intrinsics that take a buffer descriptor and offset arguments. We generate those from load/store/atomic/gep on a "fat pointer" address space in this pass: https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/patch/PatchBufferOp.cpp

I don't really have more thoughts on the issue right now, but I believe it is a very similar problem and so a future exchange of thoughts may well be helpful.

For example, it is not clear to me what the "correct" place for lowering these loads and store is. What AMDGPU and LLPC do has evolved historically. I'd say it's fairly reasonable, we did learn that pushing the lowering to be later is helpful in our case but pushing it all the way to a MIR pass (which the DXIL backend doesn't use anyway) would have been a painful amount of work.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128839/new/

https://reviews.llvm.org/D128839