[LLVMdev] IR extension proposal: bitset constants

Wed Jan 28 19:04:50 PST 2015

On Thu, Jan 29, 2015 at 12:53:49AM +0000, Renato Golin wrote:
> So, bitset would be a property that means : globals with the same name will
> append on a string of bits in the order in which they appear, and it's the
> job of the front end to make sure the correct order is followed in every
> unit, so that linking does the right job in every corner case?

Not quite. Each entry in a bitset global (after deduplication) represents
a 1 in a bitset in a particular region of the object file. The entries may
appear in any order, and it is the job of LTO (or the linker) to build the
actual bitset.

> Could that be used for efficient global boolean flags, like LLVM's options?
> Even if you don't need the range check, you get the bit mask check for
> free.

I don't understand exactly what you mean.

Peter

> I'm trying to find other uses to help you in the case to make this into a
> first class citizen, not metadata.
> 
> Cheers,
> Renato
> On 28 Jan 2015 19:10, "Peter Collingbourne" <peter at pcc.me.uk> wrote:
> 
> > Introducing a new module-level metadata format is just another kind of IR
> > extension, with the disadvantage of being untyped. We should be cutting
> > down
> > on these types of things, not introducing a new one [1].
> >
> > I feel that a global is the right design for this, as it represents data
> > that is conceptually in the "object file".
> >
> > Peter
> >
> > [1] https://groups.google.com/forum/#!topic/llvm-dev/u4hZUWAXQuY
> >
> > On Wed, Jan 28, 2015 at 09:41:14AM -0800, Kostya Serebryany wrote:
> > > I would start from using module-level metadata.
> > > An IR extension might be a good idea once we show that
> > >   - the proposed indirect call protection mechanism is efficient and
> > > useful, and
> > >   - there are other use cases for the IR extension.
> > >
> > > --kcc
> > >
> > >
> > > On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com>
> > wrote:
> > >
> > > > Is there any way to accomplish this that doesn't require changes in
> > every
> > > > part of the toolchain?
> > > >
> > > > What you describe here sounds really efficient, but for a change this
> > > > pervasive throughout the toolchain, I would at least like to see an
> > attempt
> > > > done that does not require changing the toolchain. Then we can circle
> > back
> > > > to changing the toolchain if that proves inadequate and with concrete
> > > > experience informing what really is going to require a toolchain change
> > > > (Maybe you've already done this? Please share.).
> > > >
> > > > -- Sean Silva
> > > >
> > > >
> > > > On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I would like to propose a mechanism that allows IR modules to
> > > >> co-operatively
> > > >> build a pointer set corresponding to addresses within a given set of
> > > >> globals. The specific use case I have in mind is to provide a
> > mechanism
> > > >> for a C++ program to efficiently verify (at each call site) that a
> > vtable
> > > >> pointer is in the set of valid vtable pointers for the class or its
> > > >> derived
> > > >> classes. One way of doing this is for a toolchain component to build,
> > for
> > > >> each
> > > >> class, a bit set that maps to the memory region allocated for the
> > vtables,
> > > >> such that each 1 bit in the bit set maps to a valid vtable for that
> > class,
> > > >> and lay out the vtables next to each other, to minimize the total
> > size of
> > > >> the bit sets. Call sites would perform a range check followed by a
> > load of
> > > >> the appropriate bit from the bit set.
> > > >>
> > > >> To give a concrete example, suppose we have the following classes:
> > > >>
> > > >> struct A { virtual void f(); };
> > > >> struct B : A { virtual void f(), g(); };
> > > >> struct C : A { virtual void f(), h(); };
> > > >>
> > > >> with the following vtables:
> > > >>
> > > >> _ZTV1A = { &A::rtti, &A::f };
> > > >> _ZTV1B = { &B::rtti, &B::f, &B::g };
> > > >> _ZTV1C = { &C::rtti, &C::f, &C::h };
> > > >>
> > > >> The set of valid vtables for static class A is {&_ZTV1A[1],
> > &_ZTV1B[1],
> > > >> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The
> > > >> toolchain
> > > >> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate
> > bit
> > > >> sets
> > > >> like this:
> > > >>
> > > >> A = {0,1,0,1,0,0,1,0}
> > > >> B = {0,0,0,1,0,0,0,0}
> > > >> C = {0,0,0,0,0,0,1,0}
> > > >>
> > > >> A call site (say the static type of the call is A) will check whether
> > the
> > > >> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is
> > sizeof(void
> > > >> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th
> > entry
> > > >> in A's bitset, which will be 1 if the vtable is valid for A.
> > > >>
> > > >> We are now faced with a number of implementation questions: how do we
> > > >> represent
> > > >> these (potentially partial) bit sets in LLVM, and how do we arrange
> > to lay
> > > >> out the globals and build the complete bit sets? Remember that
> > classes B
> > > >> and C may be in different translation units, so we do not necessarily
> > have
> > > >> a complete picture of A's bit set at A's compile time.
> > > >>
> > > >> What I propose is that the parts of the bit sets be represented as
> > arrays
> > > >> of pointers, which will be resolved through some mechanism (either at
> > > >> LTO time, or at regular (non-LTO) link time) to bit sets
> > corresponding to
> > > >> those addresses. This mechanism would also be responsible for laying
> > out
> > > >> the referenced globals (i.e. the vtables) as close together as
> > possible.
> > > >> We
> > > >> introduce a new flag on globals, the so-called "bitset" flag, which
> > causes
> > > >> the global to be transformed using this mechanism. Such a global
> > cannot be
> > > >> accessed in the normal way. It may only be accessed using an intrinsic
> > > >> that
> > > >> tests whether a given pointer is in the set.
> > > >>
> > > >> To return to our concrete example, a translation unit defining the
> > vtable
> > > >> for A may also define the following global:
> > > >>
> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
> > > >>
> > > >> A translation unit defining B would define:
> > > >>
> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> > > >> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> > > >>
> > > >> And C:
> > > >>
> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> > > >> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> > > >>
> > > >> A later mechanism would combine the bitset globals, lay out the
> > referenced
> > > >> globals and build the bit sets. In the LTO case, the IR linker can
> > already
> > > >> do the combination step if the globals are given appending linkage. A
> > > >> later compiler pass (which we will call the "bitset lowering pass")
> > would
> > > >> lay out
> > > >> the globals (by combining them into a single large global, and
> > RAUWing the
> > > >> original globals with aliases that gep into the large global) and
> > build
> > > >> the
> > > >> bit sets. In the non-LTO case, we could also invent an
> > > >> object-file-specific
> > > >> mechanism for the backend to tell the linker to combine and build the
> > bit
> > > >> sets,
> > > >> for example by placing the bitset globals in a specifically named
> > section,
> > > >> and place the referenced globals in individual sections so that the
> > linker
> > > >> can rearrange them.
> > > >>
> > > >> As previously mentioned, an intrinsic would be used to test entries
> > in the
> > > >> bit set at call sites. For example, the relevant IR for the following
> > C++
> > > >> code:
> > > >>
> > > >> A *some_object = ...;
> > > >> some_object->f();
> > > >>
> > > >> might look like this:
> > > >>
> > > >> %vtable = load i8** %some_object
> > > >> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> > > >> br i1 %p, label %continue, label %trap
> > > >>
> > > >> In the LTO case, such intrinsic calls would be lowered by the bitset
> > > >> lowering pass
> > > >> into the appropriate range/bit set check code, which might look like
> > this:
> > > >> (pseudocode for 64-bit platform)
> > > >>
> > > >> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable
> > >
> > > >> global_end_addr(_ZBS1A) {
> > > >>   %p = 0
> > > >> } else {
> > > >>   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
> > > >> }
> > > >>
> > > >> In the non-LTO case, we would generate similar code, but with
> > appropriate
> > > >> relocations so that the required constants (global start/end address,
> > > >> bitset
> > > >> address, mask, shift amount) would receive appropriate
> > linker-determined
> > > >> values.
> > > >>
> > > >> The first step I propose to make in this direction is a patch that
> > adds
> > > >> support for the bitset flag on globals, and introduces the bitset
> > > >> lowering pass. A
> > > >> later step would involve building backend support, and linker support
> > in
> > > >> lld.
> > > >>
> > > >> Comments appreciated.
> > > >>
> > > >> Thanks,
> > > >> --
> > > >> Peter
> > > >> _______________________________________________
> > > >> LLVM Developers mailing list
> > > >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > >>
> > > >
> > > >
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > >
> > > >
> >
> > --
> > Peter
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >

-- 
Peter