[PATCH] D28874: [AMDGPU] Add VGPR copies post regalloc fix pass

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 19 13:19:32 PST 2017

rampitec added a comment.

The problem is that RA inserts these copy instructions, so doing it before RA does not help. Here is the result:

  s_and_saveexec_b64 s[4:5], vcc                             // 0000000170EC: BE84206A
  v_mov_b32_e32 v149, v15                                    // 0000000170F0: 7F2A030F
  v_mov_b32_e32 v148, v14                                    // 0000000170F4: 7F28030E
  v_mov_b32_e32 v14, v16                                     // 0000000170F8: 7E1C0310
  v_mov_b32_e32 v192, v21                                    // 0000000170FC: 7F800315
  v_mov_b32_e32 v15, v17                                     // 000000017100: 7E1E0311
  v_mov_b32_e32 v191, v20                                    // 000000017104: 7F7E0314
  v_mov_b32_e32 v62, v11                                     // 000000017108: 7E7C030B
  v_mov_b32_e32 v156, v13                                    // 00000001710C: 7F38030D
  v_mov_b32_e32 v16, v86                                     // 000000017110: 7E200356
  v_mov_b32_e32 v143, v230                                   // 000000017114: 7F1E03E6
  v_mov_b32_e32 v20, v236                                    // 000000017118: 7E2803EC
  s_xor_b64 s[4:5], exec, s[4:5]                             // 00000001711C: 8884047E
  v_mov_b32_e32 v61, v10                                     // 000000017120: 7E7A030A
  v_mov_b32_e32 v155, v12                                    // 000000017124: 7F36030C
  v_mov_b32_e32 v17, v87                                     // 000000017128: 7E220357
  v_mov_b32_e32 v144, v231                                   // 00000001712C: 7F2003E7
  v_mov_b32_e32 v21, v237                                    // 000000017130: 7E2A03ED
  s_cbranch_execz BB1_28                                     // 000000017134: BF880138

This is not a question of scheduling, where scheduling barrier may help. The problem is that generic instructions do not have dependency on exec.
To fix the above code I can make a special check inside SIOptimizeExecMasking, but this is less general and more a hack than adding required dependency.



More information about the llvm-commits mailing list