[llvm] [RegisterCoalescer] Fix SUBREG_TO_REG handling in the RegisterCoalescer. (PR #96839)

Mon Jul 15 08:47:38 PDT 2024

stefanp-ibm wrote:

> > %2 [32r,112r:0) 0 at 32r  L000000000000000F [16r,112r:0) 0 at 16r  weight:0.000000e+00
> 
> What were we generating previously?
> 
> Also, shouldn't it be `L000000000000000F [16r, 32r0) 0 at 16r`, i.e., `32r` instead of `112r`?

@qcolombet 

I'm sorry the line above is a typo. The range should be `16r` to `112r`.

Before we added the fix we would have:
```
32B	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
	Considering merging to GR64_NOSP with %0 in %2:sub_32bit
		RHS = %0 [16r,32r:0) 0 at 16r  weight:0.000000e+00
		LHS = %2 [32r,112r:0) 0 at 32r  weight:0.000000e+00
		merge %2:0 at 32r into %0:0 at 16r --> @16r
		erased:	32r	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
AllocationOrder(GR64) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
AllocationOrder(GR64_NOSP) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
		updated: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
	Success: %0:sub_32bit -> %2
	Result = %2 [16r,112r:0) 0 at 16r  weight:0.000000e+00
```
So, basically the result is `%2 [16r,112r:0) 0 at 16r  weight:0.000000e+00`.
Which would be verified with `-verify-coalescing` and we would get the following error:
```
*** Bad machine code: Live interval for subreg operand has no subranges ***
- function:    test1
- basic block: %bb.0  (0xf2913e5e220) [0B;208B)
- instruction: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
- operand 0:   undef %2.sub_32bit:gr64_nosp
```

After we add the fix we get the following:
```
32B	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
	Considering merging to GR64_NOSP with %0 in %2:sub_32bit
		RHS = %0 [16r,32r:0) 0 at 16r  weight:0.000000e+00
		LHS = %2 [32r,112r:0) 0 at 32r  weight:0.000000e+00
		merge %2:0 at 32r into %0:0 at 16r --> @16r
		merge %2:0 at 32r into %0:0 at 16r --> @16r
		joined lanes: 000000000000000F [16r,112r:0) 0 at 16r
		Expecting instruction removal at 32r
		erased:	32r	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
AllocationOrder(GR64) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
AllocationOrder(GR64_NOSP) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
		updated: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
	Success: %0:sub_32bit -> %2
	Result = %2 [16r,112r:0) 0 at 16r  L000000000000000F [16r,112r:0) 0 at 16r  weight:0.000000e+00
```
So the result is ` %2 [16r,112r:0) 0 at 16r  L000000000000000F [16r,112r:0) 0 at 16r  weight:0.000000e+00`  and in this case the verification doesn't find any issues with it. 

https://github.com/llvm/llvm-project/pull/96839