[PATCH] Implement function prefix data as an IR feature.

Peter Collingbourne peter at pcc.me.uk
Fri Jul 26 13:57:46 PDT 2013


Ping?

On Sat, Jul 20, 2013 at 06:40:52PM -0700, Peter Collingbourne wrote:
> Previous discussion:
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
> 
> http://llvm-reviews.chandlerc.com/D1191
> 
> Files:
>   docs/BitCodeFormat.rst
>   docs/LangRef.rst
>   include/llvm/IR/Function.h
>   lib/AsmParser/LLLexer.cpp
>   lib/AsmParser/LLParser.cpp
>   lib/AsmParser/LLToken.h
>   lib/Bitcode/Reader/BitcodeReader.cpp
>   lib/Bitcode/Reader/BitcodeReader.h
>   lib/Bitcode/Writer/BitcodeWriter.cpp
>   lib/Bitcode/Writer/ValueEnumerator.cpp
>   lib/CodeGen/AsmPrinter/AsmPrinter.cpp
>   lib/IR/AsmWriter.cpp
>   lib/IR/Function.cpp
>   lib/IR/TypeFinder.cpp
>   lib/Transforms/IPO/GlobalDCE.cpp
>   test/CodeGen/X86/prefixdata.ll
>   test/Feature/prefixdata.ll

> Index: docs/BitCodeFormat.rst
> ===================================================================
> --- docs/BitCodeFormat.rst
> +++ docs/BitCodeFormat.rst
> @@ -718,7 +718,7 @@
>  MODULE_CODE_FUNCTION Record
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  
> -``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]``
> +``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prefix]``
>  
>  The ``FUNCTION`` record (code 8) marks the declaration or definition of a
>  function. The operand fields are:
> @@ -757,6 +757,9 @@
>  * *unnamed_addr*: If present and non-zero, indicates that the function has
>    ``unnamed_addr``
>  
> +* *prefix*: If non-zero, the value index of the prefix data for this function,
> +  plus 1.
> +
>  MODULE_CODE_ALIAS Record
>  ^^^^^^^^^^^^^^^^^^^^^^^^
>  
> Index: docs/LangRef.rst
> ===================================================================
> --- docs/LangRef.rst
> +++ docs/LangRef.rst
> @@ -552,16 +552,16 @@
>  name, a (possibly empty) argument list (each with optional :ref:`parameter
>  attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
>  an optional section, an optional alignment, an optional :ref:`garbage
> -collector name <gc>`, an opening curly brace, a list of basic blocks,
> -and a closing curly brace.
> +collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
> +curly brace, a list of basic blocks, and a closing curly brace.
>  
>  LLVM function declarations consist of the "``declare``" keyword, an
>  optional :ref:`linkage type <linkage>`, an optional :ref:`visibility
>  style <visibility>`, an optional :ref:`calling convention <callingconv>`,
>  an optional ``unnamed_addr`` attribute, a return type, an optional
>  :ref:`parameter attribute <paramattrs>` for the return type, a function
> -name, a possibly empty list of arguments, an optional alignment, and an
> -optional :ref:`garbage collector name <gc>`.
> +name, a possibly empty list of arguments, an optional alignment, an optional
> +:ref:`garbage collector name <gc>` and an optional :ref:`prefix <prefixdata>`.
>  
>  A function definition contains a list of basic blocks, forming the CFG
>  (Control Flow Graph) for the function. Each basic block may optionally
> @@ -598,7 +598,7 @@
>             [cconv] [ret attrs]
>             <ResultType> @<FunctionName> ([argument list])
>             [fn Attrs] [section "name"] [align N]
> -           [gc] { ... }
> +           [gc] [prefix Constant] { ... }
>  
>  .. _langref_aliases:
>  
> @@ -757,6 +757,50 @@
>  collector which will cause the compiler to alter its output in order to
>  support the named garbage collection algorithm.
>  
> +.. _prefixdata:
> +
> +Prefix Data
> +-----------
> +
> +Prefix data is data associated with a function which the code generator
> +will emit immediately before the function body.  The purpose of this feature
> +is to allow frontends to associate language-specific runtime metadata with
> +specific functions and make it available through the function pointer while
> +still allowing the function pointer to be called.  To access the data for a
> +given function, a program may bitcast the function pointer to a pointer to
> +the constant's type.  This implies that the IR symbol points to the start
> +of the prefix data.
> +
> +To maintain the semantics of ordinary function calls, the prefix data must
> +have a particular format.  Specifically, it must begin with a sequence of
> +bytes which decode to a sequence of machine instructions, valid for the
> +module's target, which transfer control to the point immediately succeeding
> +the prefix data, without performing any other visible action.  This allows
> +the inliner and other passes to reason about the semantics of the function
> +definition without needing to reason about the prefix data.  Obviously this
> +makes the format of the prefix data highly target dependent.
> +
> +A trivial example of valid prefix data for the x86 architecture is ``i8 144``,
> +which encodes the ``nop`` instruction:
> +
> +.. code-block:: llvm
> +
> +    define void @f() prefix i8 144 { ... }
> +
> +Generally prefix data can be formed by encoding a relative branch instruction
> +which skips the metadata, as in this example of valid prefix data for the
> +x86_64 architecture, where the first two bytes encode ``jmp .+10``:
> +
> +.. code-block:: llvm
> +
> +    %0 = type <{ i8, i8, i8* }>
> +
> +    define void @f() prefix %0 <{ i8 235, i8 8, i8* @md}> { ... }
> +
> +A function may have prefix data but no body.  This has similar semantics
> +to the ``available_externally`` linkage in that the data may be used by the
> +optimizers but will not be emitted in the object file.
> +
>  .. _attrgrp:
>  
>  Attribute Groups
> Index: include/llvm/IR/Function.h
> ===================================================================
> --- include/llvm/IR/Function.h
> +++ include/llvm/IR/Function.h
> @@ -23,6 +23,7 @@
>  #include "llvm/IR/BasicBlock.h"
>  #include "llvm/IR/CallingConv.h"
>  #include "llvm/IR/GlobalValue.h"
> +#include "llvm/IR/OperandTraits.h"
>  #include "llvm/Support/Compiler.h"
>  
>  namespace llvm {
> @@ -127,11 +128,14 @@
>  public:
>    static Function *Create(FunctionType *Ty, LinkageTypes Linkage,
>                            const Twine &N = "", Module *M = 0) {
> -    return new(0) Function(Ty, Linkage, N, M);
> +    return new(1) Function(Ty, Linkage, N, M);
>    }
>  
>    ~Function();
>  
> +  /// Provide fast operand accessors
> +  DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
> +
>    Type *getReturnType() const;           // Return the type of the ret val
>    FunctionType *getFunctionType() const; // Return the FunctionType for me
>  
> @@ -419,6 +423,17 @@
>    size_t arg_size() const;
>    bool arg_empty() const;
>  
> +  bool hasPrefixData() const {
> +    return getNumOperands() != 0;
> +  }
> +
> +  Constant *getPrefixData() const {
> +    assert(hasPrefixData());
> +    return cast<Constant>(Op<0>());
> +  }
> +
> +  void setPrefixData(Constant *PrefixData);
> +
>    /// viewCFG - This function is meant for use from the debugger.  You can just
>    /// say 'call F->viewCFG()' and a ghostview window should pop up from the
>    /// program, displaying the CFG of the current function with the code for each
> @@ -487,6 +502,11 @@
>    return F ? &F->getValueSymbolTable() : 0;
>  }
>  
> +template <>
> +struct OperandTraits<Function> : public OptionalOperandTraits<Function> {};
> +
> +DEFINE_TRANSPARENT_OPERAND_ACCESSORS(Function, Value)
> +
>  } // End llvm namespace
>  
>  #endif
> Index: lib/AsmParser/LLLexer.cpp
> ===================================================================
> --- lib/AsmParser/LLLexer.cpp
> +++ lib/AsmParser/LLLexer.cpp
> @@ -540,6 +540,7 @@
>    KEYWORD(alignstack);
>    KEYWORD(inteldialect);
>    KEYWORD(gc);
> +  KEYWORD(prefix);
>  
>    KEYWORD(ccc);
>    KEYWORD(fastcc);
> Index: lib/AsmParser/LLParser.cpp
> ===================================================================
> --- lib/AsmParser/LLParser.cpp
> +++ lib/AsmParser/LLParser.cpp
> @@ -2919,7 +2919,7 @@
>  /// FunctionHeader
>  ///   ::= OptionalLinkage OptionalVisibility OptionalCallingConv OptRetAttrs
>  ///       OptUnnamedAddr Type GlobalName '(' ArgList ')' OptFuncAttrs OptSection
> -///       OptionalAlign OptGC
> +///       OptionalAlign OptGC OptionalPrefix
>  bool LLParser::ParseFunctionHeader(Function *&Fn, bool isDefine) {
>    // Parse the linkage.
>    LocTy LinkageLoc = Lex.getLoc();
> @@ -2998,6 +2998,7 @@
>    std::string GC;
>    bool UnnamedAddr;
>    LocTy UnnamedAddrLoc;
> +  Constant *Prefix = 0;
>  
>    if (ParseArgumentList(ArgList, isVarArg) ||
>        ParseOptionalToken(lltok::kw_unnamed_addr, UnnamedAddr,
> @@ -3008,7 +3009,9 @@
>         ParseStringConstant(Section)) ||
>        ParseOptionalAlignment(Alignment) ||
>        (EatIfPresent(lltok::kw_gc) &&
> -       ParseStringConstant(GC)))
> +       ParseStringConstant(GC)) ||
> +      (EatIfPresent(lltok::kw_prefix) &&
> +       ParseGlobalTypeAndValue(Prefix)))
>      return true;
>  
>    if (FuncAttrs.contains(Attribute::Builtin))
> @@ -3106,6 +3109,7 @@
>    Fn->setAlignment(Alignment);
>    Fn->setSection(Section);
>    if (!GC.empty()) Fn->setGC(GC.c_str());
> +  Fn->setPrefixData(Prefix);
>    ForwardRefAttrGroups[Fn] = FwdRefAttrGrps;
>  
>    // Add all of the arguments we parsed to the function.
> Index: lib/AsmParser/LLToken.h
> ===================================================================
> --- lib/AsmParser/LLToken.h
> +++ lib/AsmParser/LLToken.h
> @@ -81,6 +81,7 @@
>      kw_alignstack,
>      kw_inteldialect,
>      kw_gc,
> +    kw_prefix,
>      kw_c,
>  
>      kw_cc, kw_ccc, kw_fastcc, kw_coldcc,
> Index: lib/Bitcode/Reader/BitcodeReader.cpp
> ===================================================================
> --- lib/Bitcode/Reader/BitcodeReader.cpp
> +++ lib/Bitcode/Reader/BitcodeReader.cpp
> @@ -975,9 +975,11 @@
>  bool BitcodeReader::ResolveGlobalAndAliasInits() {
>    std::vector<std::pair<GlobalVariable*, unsigned> > GlobalInitWorklist;
>    std::vector<std::pair<GlobalAlias*, unsigned> > AliasInitWorklist;
> +  std::vector<std::pair<Function*, unsigned> > FunctionPrefixWorklist;
>  
>    GlobalInitWorklist.swap(GlobalInits);
>    AliasInitWorklist.swap(AliasInits);
> +  FunctionPrefixWorklist.swap(FunctionPrefixes);
>  
>    while (!GlobalInitWorklist.empty()) {
>      unsigned ValID = GlobalInitWorklist.back().second;
> @@ -1005,6 +1007,20 @@
>      }
>      AliasInitWorklist.pop_back();
>    }
> +
> +  while (!FunctionPrefixWorklist.empty()) {
> +    unsigned ValID = FunctionPrefixWorklist.back().second;
> +    if (ValID >= ValueList.size()) {
> +      FunctionPrefixes.push_back(FunctionPrefixWorklist.back());
> +    } else {
> +      if (Constant *C = dyn_cast<Constant>(ValueList[ValID]))
> +        FunctionPrefixWorklist.back().first->setPrefixData(C);
> +      else
> +        return Error("Function prefix is not a constant!");
> +    }
> +    FunctionPrefixWorklist.pop_back();
> +  }
> +
>    return false;
>  }
>  
> @@ -1741,6 +1757,8 @@
>        if (Record.size() > 9)
>          UnnamedAddr = Record[9];
>        Func->setUnnamedAddr(UnnamedAddr);
> +      if (Record.size() > 10 && Record[10] != 0)
> +        FunctionPrefixes.push_back(std::make_pair(Func, Record[10]-1));
>        ValueList.push_back(Func);
>  
>        // If this is a function with a body, remember the prototype we are
> Index: lib/Bitcode/Reader/BitcodeReader.h
> ===================================================================
> --- lib/Bitcode/Reader/BitcodeReader.h
> +++ lib/Bitcode/Reader/BitcodeReader.h
> @@ -142,6 +142,7 @@
>  
>    std::vector<std::pair<GlobalVariable*, unsigned> > GlobalInits;
>    std::vector<std::pair<GlobalAlias*, unsigned> > AliasInits;
> +  std::vector<std::pair<Function*, unsigned> > FunctionPrefixes;
>  
>    /// MAttributes - The set of attributes by index.  Index zero in the
>    /// file is for null, and is thus not represented here.  As such all indices
> Index: lib/Bitcode/Writer/BitcodeWriter.cpp
> ===================================================================
> --- lib/Bitcode/Writer/BitcodeWriter.cpp
> +++ lib/Bitcode/Writer/BitcodeWriter.cpp
> @@ -550,7 +550,7 @@
>    // Emit the function proto information.
>    for (Module::const_iterator F = M->begin(), E = M->end(); F != E; ++F) {
>      // FUNCTION:  [type, callingconv, isproto, linkage, paramattrs, alignment,
> -    //             section, visibility, gc, unnamed_addr]
> +    //             section, visibility, gc, unnamed_addr, prefix]
>      Vals.push_back(VE.getTypeID(F->getType()));
>      Vals.push_back(F->getCallingConv());
>      Vals.push_back(F->isDeclaration());
> @@ -561,6 +561,8 @@
>      Vals.push_back(getEncodedVisibility(F));
>      Vals.push_back(F->hasGC() ? GCMap[F->getGC()] : 0);
>      Vals.push_back(F->hasUnnamedAddr());
> +    Vals.push_back(F->hasPrefixData() ? (VE.getValueID(F->getPrefixData()) + 1)
> +                                      : 0);
>  
>      unsigned AbbrevToUse = 0;
>      Stream.EmitRecord(bitc::MODULE_CODE_FUNCTION, Vals, AbbrevToUse);
> @@ -1847,6 +1849,8 @@
>      WriteUseList(FI, VE, Stream);
>      if (!FI->isDeclaration())
>        WriteFunctionUseList(FI, VE, Stream);
> +    if (FI->hasPrefixData())
> +      WriteUseList(FI->getPrefixData(), VE, Stream);
>    }
>  
>    // Write the aliases.
> Index: lib/Bitcode/Writer/ValueEnumerator.cpp
> ===================================================================
> --- lib/Bitcode/Writer/ValueEnumerator.cpp
> +++ lib/Bitcode/Writer/ValueEnumerator.cpp
> @@ -60,6 +60,11 @@
>         I != E; ++I)
>      EnumerateValue(I->getAliasee());
>  
> +  // Enumerate the prefix data constants.
> +  for (Module::const_iterator I = M->begin(), E = M->end(); I != E; ++I)
> +    if (I->hasPrefixData())
> +      EnumerateValue(I->getPrefixData());
> +
>    // Insert constants and metadata that are named at module level into the slot
>    // pool so that the module symbol table can refer to them...
>    EnumerateValueSymbolTable(M->getValueSymbolTable());
> Index: lib/CodeGen/AsmPrinter/AsmPrinter.cpp
> ===================================================================
> --- lib/CodeGen/AsmPrinter/AsmPrinter.cpp
> +++ lib/CodeGen/AsmPrinter/AsmPrinter.cpp
> @@ -468,6 +468,10 @@
>      OutStreamer.EmitLabel(FakeStub);
>    }
>  
> +  // Emit the prefix data.
> +  if (F->hasPrefixData())
> +    EmitGlobalConstant(F->getPrefixData());
> +
>    // Emit pre-function debug and/or EH information.
>    if (DE) {
>      NamedRegionTimer T(EHTimerName, DWARFGroupName, TimePassesIsEnabled);
> Index: lib/IR/AsmWriter.cpp
> ===================================================================
> --- lib/IR/AsmWriter.cpp
> +++ lib/IR/AsmWriter.cpp
> @@ -1647,6 +1647,10 @@
>      Out << " align " << F->getAlignment();
>    if (F->hasGC())
>      Out << " gc \"" << F->getGC() << '"';
> +  if (F->hasPrefixData()) {
> +    Out << " prefix ";
> +    writeOperand(F->getPrefixData(), true);
> +  }
>    if (F->isDeclaration()) {
>      Out << '\n';
>    } else {
> Index: lib/IR/Function.cpp
> ===================================================================
> --- lib/IR/Function.cpp
> +++ lib/IR/Function.cpp
> @@ -195,7 +195,8 @@
>  Function::Function(FunctionType *Ty, LinkageTypes Linkage,
>                     const Twine &name, Module *ParentModule)
>    : GlobalValue(PointerType::getUnqual(Ty),
> -                Value::FunctionVal, 0, 0, Linkage, name) {
> +                Value::FunctionVal, OperandTraits<Function>::op_begin(this), 0,
> +                Linkage, name) {
>    assert(FunctionType::isValidReturnType(getReturnType()) &&
>           "invalid return type");
>    SymTab = new ValueSymbolTable();
> @@ -229,6 +230,8 @@
>    // Remove the intrinsicID from the Cache.
>    if (getValueName() && isIntrinsic())
>      getContext().pImpl->IntrinsicIDCache.erase(this);
> +
> +  NumOperands = 1; // FIXME: needed by operator delete
>  }
>  
>  void Function::BuildLazyArguments() const {
> @@ -276,6 +279,8 @@
>    // blockaddresses, but BasicBlock's destructor takes care of those.
>    while (!BasicBlocks.empty())
>      BasicBlocks.begin()->eraseFromParent();
> +
> +  setPrefixData(0);
>  }
>  
>  void Function::addAttribute(unsigned i, Attribute::AttrKind attr) {
> @@ -351,6 +356,10 @@
>      setGC(SrcF->getGC());
>    else
>      clearGC();
> +  if (SrcF->hasPrefixData())
> +    setPrefixData(SrcF->getPrefixData());
> +  else
> +    setPrefixData(0);
>  }
>  
>  /// getIntrinsicID - This method returns the ID number of the specified
> @@ -720,3 +729,11 @@
>  
>    return false;
>  }
> +
> +void Function::setPrefixData(Constant *PrefixData) {
> +  if (PrefixData)
> +    NumOperands = 1;
> +  Op<0>() = PrefixData;
> +  if (!PrefixData)
> +    NumOperands = 0;
> +}
> Index: lib/IR/TypeFinder.cpp
> ===================================================================
> --- lib/IR/TypeFinder.cpp
> +++ lib/IR/TypeFinder.cpp
> @@ -44,6 +44,9 @@
>    for (Module::const_iterator FI = M.begin(), E = M.end(); FI != E; ++FI) {
>      incorporateType(FI->getType());
>  
> +    if (FI->hasPrefixData())
> +      incorporateValue(FI->getPrefixData());
> +
>      // First incorporate the arguments.
>      for (Function::const_arg_iterator AI = FI->arg_begin(),
>             AE = FI->arg_end(); AI != AE; ++AI)
> Index: lib/Transforms/IPO/GlobalDCE.cpp
> ===================================================================
> --- lib/Transforms/IPO/GlobalDCE.cpp
> +++ lib/Transforms/IPO/GlobalDCE.cpp
> @@ -179,6 +179,9 @@
>      // any globals used will be marked as needed.
>      Function *F = cast<Function>(G);
>  
> +    if (F->hasPrefixData())
> +      MarkUsedGlobalsAsNeeded(F->getPrefixData());
> +
>      for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>        for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>          for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
> Index: test/CodeGen/X86/prefixdata.ll
> ===================================================================
> --- /dev/null
> +++ test/CodeGen/X86/prefixdata.ll
> @@ -0,0 +1,15 @@
> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
> +
> + at i = linkonce_odr global i32 1
> +
> +; CHECK: f:
> +; CHECK-NEXT: .long	1
> +define void @f() prefix i32 1 {
> +  ret void
> +}
> +
> +; CHECK: g:
> +; CHECK-NEXT: .quad	i
> +define void @g() prefix i32* @i {
> +  ret void
> +}
> Index: test/Feature/prefixdata.ll
> ===================================================================
> --- /dev/null
> +++ test/Feature/prefixdata.ll
> @@ -0,0 +1,18 @@
> +; RUN: llvm-as < %s | llvm-dis > %t1.ll
> +; RUN: FileCheck %s < %t1.ll
> +; RUN: llvm-as < %t1.ll | llvm-dis > %t2.ll
> +; RUN: diff %t1.ll %t2.ll
> +; RUN: opt -O3 -S < %t1.ll | FileCheck %s
> +
> +; CHECK: @i
> + at i = linkonce_odr global i32 1
> +
> +; CHECK: f(){{.*}}prefix i32 1
> +define void @f() prefix i32 1 {
> +  ret void
> +}
> +
> +; CHECK: g(){{.*}}prefix i32* @i
> +define void @g() prefix i32* @i {
> +  ret void
> +}


-- 
Peter



More information about the llvm-commits mailing list