[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

Wed Aug 9 22:01:41 PDT 2017

On Wed, Aug 9, 2017 at 9:51 PM Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 08/09/2017 11:03 PM, Chandler Carruth wrote:
>
> Hal already answered much of this, just continuing this part of the
> discussion...
>
> On Wed, Aug 9, 2017 at 8:56 PM Xinliang David Li via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> On Wed, Aug 9, 2017 at 8:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> On 08/09/2017 10:14 PM, Xinliang David Li via llvm-commits wrote:
>>>
>>>  Can you elaborate here too? If there were missed optimization that
>>> later got fixed, there should be regression tests for them, right?  And
>>> what information is missing?
>>>
>>>
>>> To make a general statement, if we load (a, i8) and (a+2, i16), for
>>> example, and these came from some structure, we've lost the information
>>> that the load (a+1, i8) would have been legal (i.e. is known to be
>>> deferenceable). This is not specific to bit fields, but the fact that we
>>> lose information on the dereferenceable byte ranges around memory access
>>> turns into a problem when we later can't legally widen. There may be a
>>> better way to keep this information other than producing wide loads (which
>>> is an imperfect mechanism, especially the way we do it by restricting to
>>> legal integer types),
>>>
>>
> I don't think we have such a restriction? Maybe I'm missing something.
> When I originally added this logic, it definitely was not restricted to
> legal integer types.
>
>
> I believe you're right for bitfields. For general structures, however, we
> certainly load individual fields instead of loading the whole structure
> with some wide integer in order to preserve dereferenceability information.
>

I don't believe structures provide that information. See below.

>
>
>
> but at the moment, we don't have anything better.
>>>
>>
>> Ok, as you mentioned, widening looks like a workaround to paper over the
>> weakness in IR to annotate the information.  More importantly, my question
>> is whether this is a just theoretical concern.
>>
>
> I really disagree with this being a workaround.
>
> I think it is very fundamentally the correct model -- the semantics are
> that this is a single, wide memory operation that a narrow data type is
> extracted from.
>
>
> That is one option. We do need to preserve this information (maybe we can
> do this with TBAA, or similar, or maybe using some other mechanism
> entirely). However, we do try harder to do this with bitfields than with
> other aggregates. If I have struct { int a, b, c, d; } S; and I load S.d,
> we don't do this by loading a 128-bit integer and then extracting some part
> of it. Should we? Probably not.
>

We cannot, it isn't allowed (I'm pretty sure...)

1) It violates C++ (and C) memory model -- another thread could be writing
to the other variables.

2) Related to #1, there are applications that rely on this memory model,
for example structures where entire regions of the structure live in
protected pages and cannot be correctly accessed.

3) Again related to #1, there are applications that rely on the memory
model when doing memory-mapped IO to avoid reading or writing regions that
are being updated by the OS or other processes.

Bitfields are the only place where we have specific license to widen access
in the C++ memory model (that I'm aware of)....

> I suspect having better support for aggregate memory access would be a
> better solution. Or, as noted, using metadata or some other secondary
> mechanism.
>

FWIW, I actually agree that if we want to do more of this, we would be
better served by a different IR, but I strongly suspect it would look more
like first class aggregates rather than metadata so that we could reason
about it more fundamentally in terms of SSA.

But bitfields are (IMO) an importantly different problem in that they are
mergeable in interesting and important ways due to being integers and often
times sub-byte integers. This is why a single large integer combined with
late narrowing seems like a particularly desirable way to represent the
fundamental information of the semantic constraints of the program.

Maybe more aggressively preserving this information for bit fields is the
> right answer, empirically. I can believe that's true. The more-general
> problem still exists, however.
>

For other languages / semantics, yes. Increasingly I think a (better
designed / integrated / spec'ed, etc) system like FCAs would work
particularly well at making this easy to express and reason about. But it
would be a pretty significant change.

>
> The thing that appeals to me about the IR-transformation approach is the
> ability to handle "hand coded" bit fields as effectively as language-level
> bit fields. I've certainly seen my share of these, and they're definitely
> important. Moreover, this is true regardless of what we think about the
> underlying optimal model for preserving aggregate derefereceability in
> general.
>

Completely agree. Teaching LLVM to handle wide integer accesses will be
beneficial no matter what decisions are made here.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170810/026a299e/attachment.html>