[clang] [clang] Lower _BitInt(129+) to a different type in LLVM IR (PR #91364)

Tue May 7 14:36:51 PDT 2024

rjmccall wrote:

Hmm.  I think this is actually pretty different from the `bool` pattern.  Suppose we're talking about `_BitInt(N)`.  Let `BYTES := ceil(N/8)`, and let `BITS := BYTES * 8`.

The problem being presented here is this:

1. For at least some values of `N`, we cannot use LLVM's `iN` for the type of struct elements, array elements, `alloca`s, global variables, and so on, because the LLVM layout for that type does not match the high-level layout of `_BitInt(N)`.  The only available type that does match the layout appears to be `[BYTES x i8]`.

However, it doesn't follow from the need to use `[BYTES x i8]` for memory layout that we have to use `[BYTES x i8]` for loads and stores.  IIUC, loads and stores of both `iN` and `iBITS` are in fact required to only touch `BYTES` bytes and so should be valid.  It is near-certain that loads and stores of either of those types would both (1) produce far better code from the backends and (2) be far more optimizable by IR passes than loads and stores of `[BYTES x i8]`.

`bool` does run into (1) because of targets like PPC where `sizeof(bool) == 4`.  However, we still use `i8` as the in-memory type for `bool` on other targets.  Partly, this is to discourage portability bugs where people write IR-gen code that doesn't handle the PPC pattern.  But IIRC the main reason is actually to solve this other problem:

2. LLVM doesn't guarantee any particular extension behavior for integer types that aren't a multiple of 8, but ABIs do generally require objects of type `bool` to have all bits valid.

I expect that problem (2) also applies to `_BitInt`.  The upshot is that I think we need to emit code like this:

```
  %alloca = alloca [12 x i8]
  %storedv = zext i129 %v to i136 
  store i136 %storedv, ptr %alloca
```

https://github.com/llvm/llvm-project/pull/91364