[PATCH] Add a jumptable attribute and support for creating jump-instruction tables

Thu May 8 12:32:03 PDT 2014

Thanks for the comments.

On Wed, May 7, 2014 at 8:15 PM, Nick Lewycky <nlewycky at google.com> wrote:
> So first, a design comment. The 'jumptable' attribute itself has no IR
> meaning. It doesn't change anything about the execution of the IR, it
> doesn't change whether IR is valid, even. It appears to be an ABI attribute.
>
> What's the right behaviour if you link together two modules where a
> function is not "jumptable" in one but is in the other? Is it safe to apply
> "jumptable" to a module that didn't have it? It looks like not.

I believe it is safe to apply jumptable to any function that didn't
have it, since the resulting function acts like the original function
in all ways except having a different function pointer (that goes
through a single jump instruction in the jump table). And the code
rewrites all such function pointers at CodeGen time, so after the IR
modules are merged. I don't see any correctness problems in adding
this annotation unless the code is making assumptions about the
relative location of code wrt the function pointer. Are such
assumptions allowed for IR? I think the right behavior is for merges
to add the jumptable attribute if one of the functions has it, though
I don't think the code currently handles this case explicitly.

> I think the
> answer is that two modules needed to both have this attribute set the same
> way in the two modules before merging, and that this is reasonable because
> it's an ABI issue. Similarly, if you generated those two modules into .o
> files, they wouldn't be ABI compatible in machine code either, right?

I think they would be compatible, since the jumptable code adds a new
symbol for the jumptable entry (say, f_JT) but doesn't destroy the old
one; in fact, the jumptable itself uses this symbol. Calls to f in
f_jumptable.o would (have been rewritten to) pass through f_JT, and
calls to f in f_nojumptable.o would go directly to f. Even if both
modules independently set f as jumptable and are independently
compiled to machine code, JumpInstrTables will choose unique names for
their respective jumptable entries, so there won't be a name collision
when the two .o files are linked.

One of the main design goals is to preserve the ability to link
against object files that don't know anything about jumptables. E.g.,
a function that is only declared and never defined in IR should be
able to be annotated with jumptable. This will result in references to
that function in the IR being replaced with references to the
jumptable entry, but the original symbol will still be referenced in
the jumptable, and everything should still link properly. And the
jumptable function pointer should be able to be passed outside its
object file and used as a normal function pointer.

What would the implications be of this being an "ABI attribute" rather
than an IR attribute? I don't know what status such ABI features have
in LLVM.

>
> +    // Don't do the replacement if this use is a direct call to this
> function.
> +    if ((CS.isCall() || CS.isInvoke()) && !isIndirectCall(CS)) {
> +      // If the use is not the called value, then replace it.
> +      if (CS.getCalledValue() != GV) {
> +        U->set(V);
> +        return true;
> +      }
>
> I think you have a bug here when the value is used both as the callee and
> is also an argument. Instead you can use CS.isCallee(U); to determine
> whether this particular use of your value is a use as the callee operand or
> not.

I think you're right. I'll fix this.

>
> "CS.isCall() || CS.isInvoke()" simplifies to "CS" used in a bool context.
>
> +  // Go through all uses of this function and replace the uses of GV with
> the
> +  // jump-table version of the function. Get the uses as a vector before
> +  // replacing them, since replacing them changes the use list and
> invalidates
> +  // the iterator otherwise.
> +  std::vector<Use*> Uses;
> +  for (Use &U : GV->uses())
> +    Uses.push_back(&U);
> +
> +  for(Use *U : Uses)
> +    replaceGlobalValueIndirectUse(GV, F, U);
>
> Missing space after for.
>
> The common way to handle the invalidation is more like:
>
>   for (Value::use_iterator I = GV->use_begin(), E = GV->use_end(); I != E;)
> {
>     Use &U = *UI++;
>     replaceGlobalValueIndirectUse(GV, F, U);
>   }
>
> though that can probably be C++11ified now.

OK, will do

>
> + Type *Int32Ty = Type::getInt32Ty(getGlobalContext());
>
> Nope. Let me explain what a context is for.
>
> Suppose you have a game, and it uses a graphics library and it uses a 3d
> audio library. Both of those in turn use llvm under the hood, but they
> aren't aware of each other's existence. The purpose of the llvm context is
> to keep "global" (or at least, "outside the Module") issues out of the
> program-level global context, so that these two users can act independently
> and not interfere with each other in any way.
>
> The context for a module must match the context for everything inside it.
>
> Finally, they're used for supporting threading. Types are pointer
> comparable, if you construct the same type twice you get the same Type*.
> The mapping that holds these is not locked, it's assumed that you only do
> one operation on them at a time per-thread. If someone were to create two
> threads each with their own LLVMContext to keep things separate, they
> couldn't both call JumpInstrTables::transformType() since your function
> would cause racy access to the internal map inside the global context.

So the correct way here is to call getContext on the Module, then, right?

>
> + return Functions.size() > 0;
>
> DenseMap has an empty() method.

Tom