[PATCH] D28874: [AMDGPU] Add VGPR copies post regalloc fix pass
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 19 13:19:32 PST 2017
rampitec added a comment.
The problem is that RA inserts these copy instructions, so doing it before RA does not help. Here is the result:
s_and_saveexec_b64 s[4:5], vcc // 0000000170EC: BE84206A
v_mov_b32_e32 v149, v15 // 0000000170F0: 7F2A030F
v_mov_b32_e32 v148, v14 // 0000000170F4: 7F28030E
v_mov_b32_e32 v14, v16 // 0000000170F8: 7E1C0310
v_mov_b32_e32 v192, v21 // 0000000170FC: 7F800315
v_mov_b32_e32 v15, v17 // 000000017100: 7E1E0311
v_mov_b32_e32 v191, v20 // 000000017104: 7F7E0314
v_mov_b32_e32 v62, v11 // 000000017108: 7E7C030B
v_mov_b32_e32 v156, v13 // 00000001710C: 7F38030D
v_mov_b32_e32 v16, v86 // 000000017110: 7E200356
v_mov_b32_e32 v143, v230 // 000000017114: 7F1E03E6
v_mov_b32_e32 v20, v236 // 000000017118: 7E2803EC
s_xor_b64 s[4:5], exec, s[4:5] // 00000001711C: 8884047E
v_mov_b32_e32 v61, v10 // 000000017120: 7E7A030A
v_mov_b32_e32 v155, v12 // 000000017124: 7F36030C
v_mov_b32_e32 v17, v87 // 000000017128: 7E220357
v_mov_b32_e32 v144, v231 // 00000001712C: 7F2003E7
v_mov_b32_e32 v21, v237 // 000000017130: 7E2A03ED
s_cbranch_execz BB1_28 // 000000017134: BF880138
This is not a question of scheduling, where scheduling barrier may help. The problem is that generic instructions do not have dependency on exec.
To fix the above code I can make a special check inside SIOptimizeExecMasking, but this is less general and more a hack than adding required dependency.
Repository:
rL LLVM
https://reviews.llvm.org/D28874
More information about the llvm-commits
mailing list