[llvm] [LLVM] [X86] Fix integer overflows in frame layout for huge frames (PR #101840)

Fri Jan 3 20:06:54 PST 2025

wesleywiser wrote:

Apologies for how long it's taken for me to get back to this, I've been looking at it off and on for the last few months in my spare time. 

I've reduced the `SystemInfo.cpp` reproducer into a self-contained 482 lines [here](https://gist.github.com/wesleywiser/f7bea9708557647c65cf303a36cdb054) and trimmed the compile flags a bit as well. It can probably be reduced even more but I haven't had much additional luck as the reproducer is very fragile. I think the problematic bit comes from libcxx: 

https://github.com/llvm/llvm-project/blob/7c86ab8a18897c434fdb1ee3cd5ff2a71e6aae5a/libcxx/include/sstream#L727-L730

This code is dead code in the reproducer. During instruction selection, I see we optimize an add which is very close to `min(int32_t)` into an `lea`:

```
Optimized legalized selection DAG: %bb.21 '?ParseAMDCatalystDriverVersion@@YA_NABVbasic_string at __Cr@std@@@Z:while.body.i.preheader'
SelectionDAG has 36 nodes:
  t0: ch,glue = EntryToken
    t18: i32,ch = CopyFromReg t0, Register:i32 %5
  t20: i32 = add t18, Constant:i32<-2147483647>
          t48: i32 = X86ISD::CMP t20, Constant:i32<0>
        t50: i32 = X86ISD::CMOV t20, Constant:i32<1>, TargetConstant:i8<8>, t48
      t26: ch = CopyToReg t0, Register:i32 %22, t50
            # this add becomes...
            t15: i32 = add FrameIndex:i32<0>, Constant:i32<-2147483625>   
              t10: i32,ch = CopyFromReg t0, Register:i32 %1
            t12: i32 = add nuw t10, Constant:i32<2147483647>
                  t2: i32,ch = CopyFromReg t0, Register:i32 %3
                t4: i32 = AssertSext t2, ValueType:ch:i8
              t5: i8 = truncate t4
            t44: i32 = X86ISD::CMP t5, Constant:i8<0>
          t47: i32 = X86ISD::CMOV t15, t12, TargetConstant:i8<8>, t44
            t35: i32 = sra t20, Constant:i8<31>
          t36: i32 = and t35, Constant:i32<2147483647>
        t28: i32 = add t47, t36
      t30: ch = CopyToReg t0, Register:i32 %23, t28
    t32: ch = TokenFactor t26, t30
  t33: ch = br t32, BasicBlock:ch<if.then23.i 0x569e4deeb3c8>


Selected selection DAG: %bb.21 '?ParseAMDCatalystDriverVersion@@YA_NABVbasic_string at __Cr@std@@@Z:while.body.i.preheader'
SelectionDAG has 41 nodes:
  t0: ch,glue = EntryToken
    t18: i32,ch = CopyFromReg t0, Register:i32 %5
  t20: i32,i32 = ADD32ri t18, TargetConstant:i32<-2147483647>
      t2: i32,ch = CopyFromReg t0, Register:i32 %3
    t58: i32 = COPY_TO_REGCLASS t2, TargetConstant:i32<48>
  t5: i8 = EXTRACT_SUBREG t58, TargetConstant:i32<1>
          t23: i32 = MOV32ri TargetConstant:i32<1>
            t48: i32 = TEST32rr t20, t20
          t55: ch,glue = CopyToReg t0, Register:i32 $eflags, t48
        t50: i32 = CMOV_GR32 t20, t23, TargetConstant:i8<8>, t55:1
      t26: ch = CopyToReg t0, Register:i32 %22, t50
            # ... this lea
            t15: i32 = LEA32r TargetFrameIndex:i32<0>, TargetConstant:i8<1>, Register:i32 $noreg, TargetConstant:i32<-2147483625>, Register:i16 $noreg    
              t10: i32,ch = CopyFromReg t0, Register:i32 %1
            t12: i32,i32 = ADD32ri nuw t10, TargetConstant:i32<2147483647>
              t44: i32 = TEST8rr t5, t5
            t53: ch,glue = CopyToReg t0, Register:i32 $eflags, t44
          t47: i32 = CMOV_GR32 t15, t12, TargetConstant:i8<8>, t53:1
            t35: i32,i32 = SAR32ri t20, TargetConstant:i8<31>
          t36: i32,i32 = AND32ri t35, TargetConstant:i32<2147483647>
        t28: i32,i32 = ADD32rr t47, t36
      t30: ch = CopyToReg t0, Register:i32 %23, t28
    t32: ch = TokenFactor t26, t30
  t33: ch = JMP_1 BasicBlock:ch<if.then23.i 0x569e4deeb3c8>, t32
```

and then during frame layout, when we adjust the offset to accommodate the frame layout and the frame pointer, we underflow `int32_t` and trigger the (corrected) offset assertion. 

It looks like there's already code that tries to handle this kind of thing in X86 ISel here:

https://github.com/llvm/llvm-project/blob/dc3cd2e95ee56cdb75f4d0d0742626f912b5c6f3/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp#L1803-L1811

If I add a check to [`X86DAGToDAGISel::foldOffsetIntoAddress`](https://github.com/llvm/llvm-project/blob/dc3cd2e95ee56cdb75f4d0d0742626f912b5c6f3/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp#L1813C6-L1813C44), then the reproducer no longer asserts and the add isn't converted into an `lea`:

```diff
@@ -1821,7 +1826,11 @@ bool X86DAGToDAGISel::foldOffsetIntoAddress(uint64_t Offset,
     if (Subtarget->isTarget64BitILP32() && !isUInt<31>(Val) &&
         !AM.hasBaseOrIndexReg())
       return true;
-  }
+  } 
+  else if (AM.hasBaseOrIndexReg() && ! isDispSafeForFrameIndex(Val))
+     // For 32-bit X86, make sure the displacement still isn't close to the
+     // expressible limit.
+     return true;
   AM.Disp = Val;
   return false;
 }
```

@arsenm does that seem like a reasonable thing to do? 

https://github.com/llvm/llvm-project/pull/101840