[llvm-commits] [cfe-commits] TableGen backend API refactoring review request

Chandler Carruth chandlerc at google.com
Thu Jun 21 01:04:57 PDT 2012


On Wed, Jun 20, 2012 at 11:57 PM, Sean Silva <silvas at purdue.edu> wrote:

> The Records already have a member `unsigned ID` which is unique.
> Specializing the less/hash traits classes to perform the less/hashing on
> that `unsigned` will result in deterministic behavior (since the
> nondeterminism arises entirely from the pointer values; both "<" and the
> hash functions are deterministic (although the hash table order might
> change more than comfortable with small changes to the .td files or the
> backend)).


The hash function should never be relied on to be deterministic. One of
these days I'm going to have the stones to flip a switch and the hash
function will produce different values on each execution with high
probability.

Even without this, the bucket resizing alone can produce highly surprising
artifacts here. If you want deterministic output, sort them.

My question is mostly looking for ideas about how to perform the
> migration. There are numerous places in the backends where this
> nondeterminism is attacked in various "ad-hoc" ways, and it seems like a
> more coherent "right by design" approach is needed.
>
> For example, I ran into all of the following:
>
> * Places where a custom comparator is passed to map/set (there are no
> fewer than 3 custom comparators across the backends). With these, it is not
> usually clear without really understanding the code whether the actual
> order is actually important for the emission, or whether it is "just" to
> avoid the nondeterminism.
> * Don't use a custom comparator, but sort afterwards. Like the above, with
> these its not usually clear without really understanding the code whether
> the sort is actually an important part of what is being emitted, or a
> band-aid for the nondeterministic container order.
> * Just use nondeterministic order (AsmMatcherEmitter.cpp is particularly
> nasty). These are usually littered with bare std::map/set, with no typedefs
> to show what the types are semantically, so without fully understanding the
> code it is not possible to perform any maintenance that would result in any
> appreciable unification/simplification of the handling of the
> nondeterminism.
>

I think you're muddying the waters here a bit.

There are two sources of non-deterministic output that seem to be at issue:

1) Non-deterministic order of iteration data structures. All of these in
the codebase to my knowledge are some variant of a hash table.
2) Not-deterministic "ID" used as the key to a data structure or algorithm.

These are orthogonal, and can overlap.

The solutions to #1 seem straight forward, but which one is best depend
upon the context so it is hard to make blanket rules. Sometimes, build up a
vector and sort it. Sometimes, use a SetVector. Sometimes, switch entirely
from a hashing-based container to a sorted container. There are a lot of
options here. We can make it harder to get wrong by using a wrapper around
the hash table which does not support direct iteration, forcing the use of
a sorted output buffer.

The solutions to #2 are often a bit trickier to come up with. Record has a
good one, but do other entities? How do we establish them? What is the
right technique to use to establish stable keys?

An interesting benefit of SetVector is that it tends to solve both #1 and
#2 simultaneously. However, it has a high cost associated with it.


Maybe a good first step would be to instrument TableGen to force the
> pointer values to be "really nondeterministic" so that problems will crop
> up if they exist?
>

Just repeatedly generate the output on a system with ASLR?

How about we add a mode to the build system which re-generates each
tablegen entry on each build, and asserts that it matches the one used for
that build? Then the build bots will go red if we mess this up.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120621/ed132cd7/attachment.html>


More information about the llvm-commits mailing list