[llvm-commits] [cfe-commits] TableGen backend API refactoring review request

Sean Silva silvas at purdue.edu
Thu Jun 21 21:30:22 PDT 2012


> Just repeatedly generate the output on a system with ASLR?
My system has ASLR (pretty vanilla ubuntu Precise), but it does not
appear to have a significant effect. In my understanding, a possible
explanation is that ASLR doesn't do a lot to the *relative* orderings
of pointers inside the heap, just their absolute values.

On Thu, Jun 21, 2012 at 1:04 AM, Chandler Carruth <chandlerc at google.com> wrote:
> On Wed, Jun 20, 2012 at 11:57 PM, Sean Silva <silvas at purdue.edu> wrote:
>>
>> The Records already have a member `unsigned ID` which is unique.
>> Specializing the less/hash traits classes to perform the less/hashing on
>> that `unsigned` will result in deterministic behavior (since the
>> nondeterminism arises entirely from the pointer values; both "<" and the
>> hash functions are deterministic (although the hash table order might change
>> more than comfortable with small changes to the .td files or the backend)).
>
>
> The hash function should never be relied on to be deterministic. One of
> these days I'm going to have the stones to flip a switch and the hash
> function will produce different values on each execution with high
> probability.
>
> Even without this, the bucket resizing alone can produce highly surprising
> artifacts here. If you want deterministic output, sort them.
>
>> My question is mostly looking for ideas about how to perform the
>> migration. There are numerous places in the backends where this
>> nondeterminism is attacked in various "ad-hoc" ways, and it seems like a
>> more coherent "right by design" approach is needed.
>>
>> For example, I ran into all of the following:
>>
>> * Places where a custom comparator is passed to map/set (there are no
>> fewer than 3 custom comparators across the backends). With these, it is not
>> usually clear without really understanding the code whether the actual order
>> is actually important for the emission, or whether it is "just" to avoid the
>> nondeterminism.
>> * Don't use a custom comparator, but sort afterwards. Like the above, with
>> these its not usually clear without really understanding the code whether
>> the sort is actually an important part of what is being emitted, or a
>> band-aid for the nondeterministic container order.
>> * Just use nondeterministic order (AsmMatcherEmitter.cpp is particularly
>> nasty). These are usually littered with bare std::map/set, with no typedefs
>> to show what the types are semantically, so without fully understanding the
>> code it is not possible to perform any maintenance that would result in any
>> appreciable unification/simplification of the handling of the
>> nondeterminism.
>
>
> I think you're muddying the waters here a bit.
>
> There are two sources of non-deterministic output that seem to be at issue:
>
> 1) Non-deterministic order of iteration data structures. All of these in the
> codebase to my knowledge are some variant of a hash table.
> 2) Not-deterministic "ID" used as the key to a data structure or algorithm.
>
> These are orthogonal, and can overlap.
>
> The solutions to #1 seem straight forward, but which one is best depend upon
> the context so it is hard to make blanket rules. Sometimes, build up a
> vector and sort it. Sometimes, use a SetVector. Sometimes, switch entirely
> from a hashing-based container to a sorted container. There are a lot of
> options here. We can make it harder to get wrong by using a wrapper around
> the hash table which does not support direct iteration, forcing the use of a
> sorted output buffer.
>
> The solutions to #2 are often a bit trickier to come up with. Record has a
> good one, but do other entities? How do we establish them? What is the right
> technique to use to establish stable keys?
>
> An interesting benefit of SetVector is that it tends to solve both #1 and #2
> simultaneously. However, it has a high cost associated with it.
>
>
>> Maybe a good first step would be to instrument TableGen to force the
>> pointer values to be "really nondeterministic" so that problems will crop up
>> if they exist?
>
>
> Just repeatedly generate the output on a system with ASLR?
>
> How about we add a mode to the build system which re-generates each tablegen
> entry on each build, and asserts that it matches the one used for that
> build? Then the build bots will go red if we mess this up.




More information about the llvm-commits mailing list