[LLVMdev] LLVM IR is a compiler IR

Wed Oct 5 04:32:45 PDT 2011

Hello Dan.

Duncan Sands <baldrick at free.fr> writes:

>> There are places where compatibility with the native C ABI is taken too
>> far. For instance, time ago I noted that what the user sets through
>> Module::setDataLayout is simply ignored.
>
> it's not ignored, it's used by the IR level optimizers.  That way these
> optimizers can know stuff about the target without having to be linked
> to a target backend.

Well, it is used by one layer, ignored by another. Anyways LLVM is not
doing what the user expects.

>> LLVM uses the data layout
>> required by the native C ABI, which is hardcoded into LLVM's source
>> code. So I asked: pass the value setted by Module::setDataLayout to the
>> layers that are interested on it, as any user would expect.
>
> There are two classes of information in datalayout: things which correspond
> to stuff hard-wired into the target processor (for example that x86 is little
> endian), and stuff which is not hard-wired in (for example the alignment of
> x86 long double, which is 4 or 8 bytes on x86-32 depending on whether you are
> on linux, darwin or windows).  Hoping to have code generators override the
> hard-wired stuff if they see something different in the data layout is just
> too much to ask for - eg the x86 code generators are never going to produce big
> endian code just because you set big-endianness in the datalayout.  Even the
> second class of "soft" parameters is not completely flexible: for example most
> processors enforce a minimum alignment for types, and trying to reduce it by
> giving types a lesser alignment in the datalayout just isn't going to work.
> So given that the ways in which codegen could adapt to various datalayout
> settings are quite limited and constrained by the target, does it really make
> sense to try to parametrize the codegenerators by the datalayout at all?
> In any case, it might be good if the code generators produced a warning if they
> see that the datalayout string doesn't correspond to what codegen thinks it
> should be (I though someone added that already?).

You focus your reasoning on possible wrong uses of the data layout
setting (endianness) when, as you say, there are other uses which are
perfectly legit (using a specific alignment within the limits allowed by
the processor.)  So if I need to align my data on a different way of
what the C ABI requires or generate code for a platform that LLVM still
does not know about, my only solution is to patch LLVM because the value
setted through one of its APIs is ignored on key places, as LLVM assumes
that everybody wants full interoperability with C. This is the kind of
logic that tells me that LLVM is a C-obsessed project: any requirement
that falls outside the needs of a C compiler writer is seen as
superfluous even if it does not conflict with the rest of LLVM.