[PATCH] D81675: SILoadStoreOptimizer: add support for GFX10 image instructions
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 12 06:59:01 PDT 2020
nhaehnle added a comment.
Mostly LGTM, with some nitpicks.
================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:108-113
+ bool SBase = 0;
+ bool SRsrc = 0;
+ bool SOffset = 0;
+ bool VAddr = 0;
+ bool Addr = 0;
+ bool SSamp = 0;
----------------
false instead of 0 for bools.
But how about keeping RegisterEnum and making this two unsigned chars, one for the RegisterEnum flags and one for the NumVAddrs?
================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:116
+// GFX10 image_sample instructions can have 16 vaddrs + 1 srsrc + 1 ssamp.
+const unsigned MAX_ADDRESS_REGS = 18;
----------------
rampitec wrote:
> arsenm wrote:
> > rampitec wrote:
> > > foad wrote:
> > > > Is this really true? I see opcodes like IMAGE_SAMPLE_D_CL_O_V4_V16 mentioned in some generated tables, but I'm not sure what the V16 there really means or whether these will ever occur in practice.
> > > I think that's because we do not have all register classes, so it uses VReg_512 for 10, 11, or 12 dword address for example. I believe a longest one has 12 vaddrs, like this:
> > >
> > >
> > > ```
> > > image_sample_c_d_cl_o v[16:19], [v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19], s[20:27], s[100:103] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x00]
> > > 0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x14
> > > ```
> > >
> > > You can check it in MIMGInstructions.td where MIMG_Sampler_AddrSizes is defined.
> > I think 12 is what I remember was the maximum. I was confused by the missing register classes. We probably should add the missing ones now that globalisel at least would handle them (I had to go back and break the code to round to the instruction constraints)
> It will probably require not only register classes, but value types as well.
I remember the value types being the part where I got stuck when I tried to do this a very, very long time ago. And yes, 12 is the maximum number of vaddrs.
================
Comment at: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:425
-static unsigned getRegs(unsigned Opc, const SIInstrInfo &TII) {
- if (TII.isMUBUF(Opc)) {
- unsigned result = 0;
+static struct AddressRegs getRegs(unsigned Opc, const SIInstrInfo &TII) {
+ struct AddressRegs Result;
----------------
Remove the `struct`
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D81675/new/
https://reviews.llvm.org/D81675
More information about the llvm-commits
mailing list