[clang] [llvm] [RegisterCoalescer] Improve register allocation for return values by limiting rematerialization (PR #163047)

guan jian via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 5 21:15:48 PST 2025


rez5427 wrote:

I think register coalescer is pretty much the same thing as gcc's early_remat. The llvm's register coalescer decide to remat this, because the return register is been used. And gcc's early_remat decide not to remat this. I put part of the gcc's log in here:

cast_and_load_1.c.319r.sched1:
```
;;   ======================================================
;;   -- basic block 2 from 5 to 13 -- before reload
;;   ======================================================

;; Pressure summary (bb 2): GR_REGS:3 FP_REGS:0 V_REGS:0

;;	  0--> b  0: i   5 r135=high(`bytes1')                     :alu:GR_REGS+1(1)FP_REGS+0(0)V_REGS+0(0):model 0
;;	  1--> b  0: i   6 r136=0x2a                               :alu:GR_REGS+1(1)FP_REGS+0(0)V_REGS+0(0):model 1
;;	  2--> b  0: i   7 [r135+low(`bytes1')]=r136#0             :alu:GR_REGS+0(-1)FP_REGS+0(0)V_REGS+0(0):model 2
;;	  3--> b  0: i  12 a0=r136                                 :alu:GR_REGS+1(0)FP_REGS+0(0)V_REGS+0(0):model 3
;;	  4--> b  0: i  13 use a0                                  :nothing:GR_REGS+0(0)FP_REGS+0(0)V_REGS+0(0):model 4
;;	Ready list (final):  
;;   total time = 4
;;   new head = 5
;;   new tail = 13
```

cast_and_load_1.c.321r.early_remat:
```

;; Function cast_and_load_1 (cast_and_load_1, funcdef_no=0, decl_uid=2297, cgraph_uid=1, symbol_order=0)

starting the processing of deferred insns
ending the processing of deferred insns


cast_and_load_1

Dataflow summary:
;;  fully invalidated by EH 	 0 [zero] 1 [ra] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [vxsat] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used 	 2 [sp] 64 [arg] 65 [frame] 68 [vxrm] 69 [frm]
;;  regular block artificial uses 	 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  eh block artificial uses 	 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  entry block defs 	 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame] 68 [vxrm] 69 [frm]
;;  exit block uses 	 1 [ra] 2 [sp] 8 [s0] 10 [a0] 65 [frame] 68 [vxrm] 69 [frm]
;;  regs ever live 	 10 [a0] 69 [frm]
;;  ref usage 	r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={2d,2u} r11={1d} r12={1d} r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} r68={1d,1u} r69={1d,1u} r135={1d,1u} r136={1d,2u} 
;;    total ref usage 41{26d,15u,0e} in 5{5 regular + 0 call} insns.
(note 1 0 15 NOTE_INSN_DELETED)
(note 15 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 15 17 2 NOTE_INSN_FUNCTION_BEG)
(note 17 2 5 2 NOTE_INSN_DELETED)
(insn 5 17 6 2 (set (reg/f:DI 135)
        (high:DI (symbol_ref:DI ("bytes1") [flags 0xc4]  <var_decl 0x7403341e32f8 bytes1>))) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 277 {*movdi_64bit}
     (nil))cast_and_load_1.c.321r.early_remat
(insn 6 5 7 2 (set (reg:DI 136)
        (const_int 42 [0x2a])) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 277 {*movdi_64bit}
     (nil))
(insn 7 6 12 2 (set (mem/c:SI (lo_sum:DI (reg/f:DI 135)
                (symbol_ref:DI ("bytes1") [flags 0xc4]  <var_decl 0x7403341e32f8 bytes1>)) [1 bytes1+0 S4 A32])
        (subreg:SI (reg:DI 136) 0)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":35:10 278 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:DI 135)
        (nil)))
(insn 12 7 13 2 (set (reg/i:DI 10 a0)
        (reg:DI 136)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":38:1 277 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 136)
        (expr_list:REG_EQUAL (const_int 42 [0x2a])
            (nil))))
(insn 13 12 20 2 (use (reg/i:DI 10 a0)) "out/value-simplify-pointer-info/cast_and_load_1/cast_and_load_1.c":38:1 -1
     (nil))
(note 20 13 0 NOTE_INSN_DELETED)

```

cast_and_load_1.c.322r.ira:
```
+++Allocating 16 bytes for conflict table (uncompressed size 16)
;; a0(r136,l0) conflicts: a1(r135,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a1(r135,l0) conflicts: a0(r136,l0)
;;     total conflict hard regs:
;;     conflict hard regs:


  pref0:a0(r136)<-hr10 at 2000
  regions=1, blocks=3, points=2
    allocnos=2 (big 0), copies=0, conflicts=0, ranges=2
```

If you want this full log, you can try with gcc15 `riscv64-unknown-linux-gnu-gcc -std=c11 -march=rv64gcv_zvl128b -O3 -fomit-frame-pointer -S -fdump-tree-all -fdump-rtl-all -dumpdir `

I think maybe should extrat remat related logic out of register coalescer would be good?

https://github.com/llvm/llvm-project/pull/163047


More information about the llvm-commits mailing list