[PATCH] D23364: AMDGPU: Set sizes of spill pseudos

Thu Aug 11 12:04:34 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:1962-1963
@@ -1959,1 +1961,4 @@
+
+      // (2 * 4) + (8 * num_subregs) bytes maximum
+      let Size = !add(!shl(!srl(vgpr_class.Size, 5), 3), 8);
     }
----------------
nhaehnle wrote:
> I just took a look, and for some reason the reloads tend to look like
> 
>         buffer_load_dword v3, off, s[72:75], s70 offset:1444 ; 16-byte Folded Reload
>                                         ; encoding: [0xa4,0x05,0x30,0xe0,0x00,0x03,0x12,0x46]
>         s_waitcnt vmcnt(0)              ; encoding: [0x70,0x0f,0x8c,0xbf]
>         buffer_load_dword v4, off, s[72:75], s70 offset:1448 ; 16-byte Folded Reload
>                                         ; encoding: [0xa8,0x05,0x30,0xe0,0x00,0x04,0x12,0x46]
>         s_waitcnt vmcnt(0)              ; encoding: [0x70,0x0f,0x8c,0xbf]
> 
> etc., so you actually get 12 bytes per dword. Not sure if that's a problem, especially since those waits are really wrong anyway (perhaps the wait insertion gets confused by the register/subregister relationship?).
I'm not really sure what to do about waitcnts. It doesn't really matter for correctness, since the branch relax pass right now runs after these should be eliminated (these may be inserted during relaxation but isn't a concern yet)


https://reviews.llvm.org/D23364