[PATCH] Implement function prefix data as an IR feature.
Duncan Sands
duncan.sands at gmail.com
Mon Jul 29 05:44:09 PDT 2013
Hi Peter,
On 26/07/13 22:57, Peter Collingbourne wrote:
> Ping?
I think requiring the prefix data to start with a bunch of machine specific
magic bytes is really horrible. Why can't codegen insert the magic code
sequence automatically? I don't think this patch should go in as it is.
Ciao, Duncan.
>
> On Sat, Jul 20, 2013 at 06:40:52PM -0700, Peter Collingbourne wrote:
>> Previous discussion:
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
>>
>> http://llvm-reviews.chandlerc.com/D1191
>>
>> Files:
>> docs/BitCodeFormat.rst
>> docs/LangRef.rst
>> include/llvm/IR/Function.h
>> lib/AsmParser/LLLexer.cpp
>> lib/AsmParser/LLParser.cpp
>> lib/AsmParser/LLToken.h
>> lib/Bitcode/Reader/BitcodeReader.cpp
>> lib/Bitcode/Reader/BitcodeReader.h
>> lib/Bitcode/Writer/BitcodeWriter.cpp
>> lib/Bitcode/Writer/ValueEnumerator.cpp
>> lib/CodeGen/AsmPrinter/AsmPrinter.cpp
>> lib/IR/AsmWriter.cpp
>> lib/IR/Function.cpp
>> lib/IR/TypeFinder.cpp
>> lib/Transforms/IPO/GlobalDCE.cpp
>> test/CodeGen/X86/prefixdata.ll
>> test/Feature/prefixdata.ll
>
>> Index: docs/BitCodeFormat.rst
>> ===================================================================
>> --- docs/BitCodeFormat.rst
>> +++ docs/BitCodeFormat.rst
>> @@ -718,7 +718,7 @@
>> MODULE_CODE_FUNCTION Record
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> -``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]``
>> +``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prefix]``
>>
>> The ``FUNCTION`` record (code 8) marks the declaration or definition of a
>> function. The operand fields are:
>> @@ -757,6 +757,9 @@
>> * *unnamed_addr*: If present and non-zero, indicates that the function has
>> ``unnamed_addr``
>>
>> +* *prefix*: If non-zero, the value index of the prefix data for this function,
>> + plus 1.
>> +
>> MODULE_CODE_ALIAS Record
>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> Index: docs/LangRef.rst
>> ===================================================================
>> --- docs/LangRef.rst
>> +++ docs/LangRef.rst
>> @@ -552,16 +552,16 @@
>> name, a (possibly empty) argument list (each with optional :ref:`parameter
>> attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
>> an optional section, an optional alignment, an optional :ref:`garbage
>> -collector name <gc>`, an opening curly brace, a list of basic blocks,
>> -and a closing curly brace.
>> +collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
>> +curly brace, a list of basic blocks, and a closing curly brace.
>>
>> LLVM function declarations consist of the "``declare``" keyword, an
>> optional :ref:`linkage type <linkage>`, an optional :ref:`visibility
>> style <visibility>`, an optional :ref:`calling convention <callingconv>`,
>> an optional ``unnamed_addr`` attribute, a return type, an optional
>> :ref:`parameter attribute <paramattrs>` for the return type, a function
>> -name, a possibly empty list of arguments, an optional alignment, and an
>> -optional :ref:`garbage collector name <gc>`.
>> +name, a possibly empty list of arguments, an optional alignment, an optional
>> +:ref:`garbage collector name <gc>` and an optional :ref:`prefix <prefixdata>`.
>>
>> A function definition contains a list of basic blocks, forming the CFG
>> (Control Flow Graph) for the function. Each basic block may optionally
>> @@ -598,7 +598,7 @@
>> [cconv] [ret attrs]
>> <ResultType> @<FunctionName> ([argument list])
>> [fn Attrs] [section "name"] [align N]
>> - [gc] { ... }
>> + [gc] [prefix Constant] { ... }
>>
>> .. _langref_aliases:
>>
>> @@ -757,6 +757,50 @@
>> collector which will cause the compiler to alter its output in order to
>> support the named garbage collection algorithm.
>>
>> +.. _prefixdata:
>> +
>> +Prefix Data
>> +-----------
>> +
>> +Prefix data is data associated with a function which the code generator
>> +will emit immediately before the function body. The purpose of this feature
>> +is to allow frontends to associate language-specific runtime metadata with
>> +specific functions and make it available through the function pointer while
>> +still allowing the function pointer to be called. To access the data for a
>> +given function, a program may bitcast the function pointer to a pointer to
>> +the constant's type. This implies that the IR symbol points to the start
>> +of the prefix data.
>> +
>> +To maintain the semantics of ordinary function calls, the prefix data must
>> +have a particular format. Specifically, it must begin with a sequence of
>> +bytes which decode to a sequence of machine instructions, valid for the
>> +module's target, which transfer control to the point immediately succeeding
>> +the prefix data, without performing any other visible action. This allows
>> +the inliner and other passes to reason about the semantics of the function
>> +definition without needing to reason about the prefix data. Obviously this
>> +makes the format of the prefix data highly target dependent.
>> +
>> +A trivial example of valid prefix data for the x86 architecture is ``i8 144``,
>> +which encodes the ``nop`` instruction:
>> +
>> +.. code-block:: llvm
>> +
>> + define void @f() prefix i8 144 { ... }
>> +
>> +Generally prefix data can be formed by encoding a relative branch instruction
>> +which skips the metadata, as in this example of valid prefix data for the
>> +x86_64 architecture, where the first two bytes encode ``jmp .+10``:
>> +
>> +.. code-block:: llvm
>> +
>> + %0 = type <{ i8, i8, i8* }>
>> +
>> + define void @f() prefix %0 <{ i8 235, i8 8, i8* @md}> { ... }
>> +
>> +A function may have prefix data but no body. This has similar semantics
>> +to the ``available_externally`` linkage in that the data may be used by the
>> +optimizers but will not be emitted in the object file.
>> +
>> .. _attrgrp:
>>
>> Attribute Groups
>> Index: include/llvm/IR/Function.h
>> ===================================================================
>> --- include/llvm/IR/Function.h
>> +++ include/llvm/IR/Function.h
>> @@ -23,6 +23,7 @@
>> #include "llvm/IR/BasicBlock.h"
>> #include "llvm/IR/CallingConv.h"
>> #include "llvm/IR/GlobalValue.h"
>> +#include "llvm/IR/OperandTraits.h"
>> #include "llvm/Support/Compiler.h"
>>
>> namespace llvm {
>> @@ -127,11 +128,14 @@
>> public:
>> static Function *Create(FunctionType *Ty, LinkageTypes Linkage,
>> const Twine &N = "", Module *M = 0) {
>> - return new(0) Function(Ty, Linkage, N, M);
>> + return new(1) Function(Ty, Linkage, N, M);
>> }
>>
>> ~Function();
>>
>> + /// Provide fast operand accessors
>> + DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
>> +
>> Type *getReturnType() const; // Return the type of the ret val
>> FunctionType *getFunctionType() const; // Return the FunctionType for me
>>
>> @@ -419,6 +423,17 @@
>> size_t arg_size() const;
>> bool arg_empty() const;
>>
>> + bool hasPrefixData() const {
>> + return getNumOperands() != 0;
>> + }
>> +
>> + Constant *getPrefixData() const {
>> + assert(hasPrefixData());
>> + return cast<Constant>(Op<0>());
>> + }
>> +
>> + void setPrefixData(Constant *PrefixData);
>> +
>> /// viewCFG - This function is meant for use from the debugger. You can just
>> /// say 'call F->viewCFG()' and a ghostview window should pop up from the
>> /// program, displaying the CFG of the current function with the code for each
>> @@ -487,6 +502,11 @@
>> return F ? &F->getValueSymbolTable() : 0;
>> }
>>
>> +template <>
>> +struct OperandTraits<Function> : public OptionalOperandTraits<Function> {};
>> +
>> +DEFINE_TRANSPARENT_OPERAND_ACCESSORS(Function, Value)
>> +
>> } // End llvm namespace
>>
>> #endif
>> Index: lib/AsmParser/LLLexer.cpp
>> ===================================================================
>> --- lib/AsmParser/LLLexer.cpp
>> +++ lib/AsmParser/LLLexer.cpp
>> @@ -540,6 +540,7 @@
>> KEYWORD(alignstack);
>> KEYWORD(inteldialect);
>> KEYWORD(gc);
>> + KEYWORD(prefix);
>>
>> KEYWORD(ccc);
>> KEYWORD(fastcc);
>> Index: lib/AsmParser/LLParser.cpp
>> ===================================================================
>> --- lib/AsmParser/LLParser.cpp
>> +++ lib/AsmParser/LLParser.cpp
>> @@ -2919,7 +2919,7 @@
>> /// FunctionHeader
>> /// ::= OptionalLinkage OptionalVisibility OptionalCallingConv OptRetAttrs
>> /// OptUnnamedAddr Type GlobalName '(' ArgList ')' OptFuncAttrs OptSection
>> -/// OptionalAlign OptGC
>> +/// OptionalAlign OptGC OptionalPrefix
>> bool LLParser::ParseFunctionHeader(Function *&Fn, bool isDefine) {
>> // Parse the linkage.
>> LocTy LinkageLoc = Lex.getLoc();
>> @@ -2998,6 +2998,7 @@
>> std::string GC;
>> bool UnnamedAddr;
>> LocTy UnnamedAddrLoc;
>> + Constant *Prefix = 0;
>>
>> if (ParseArgumentList(ArgList, isVarArg) ||
>> ParseOptionalToken(lltok::kw_unnamed_addr, UnnamedAddr,
>> @@ -3008,7 +3009,9 @@
>> ParseStringConstant(Section)) ||
>> ParseOptionalAlignment(Alignment) ||
>> (EatIfPresent(lltok::kw_gc) &&
>> - ParseStringConstant(GC)))
>> + ParseStringConstant(GC)) ||
>> + (EatIfPresent(lltok::kw_prefix) &&
>> + ParseGlobalTypeAndValue(Prefix)))
>> return true;
>>
>> if (FuncAttrs.contains(Attribute::Builtin))
>> @@ -3106,6 +3109,7 @@
>> Fn->setAlignment(Alignment);
>> Fn->setSection(Section);
>> if (!GC.empty()) Fn->setGC(GC.c_str());
>> + Fn->setPrefixData(Prefix);
>> ForwardRefAttrGroups[Fn] = FwdRefAttrGrps;
>>
>> // Add all of the arguments we parsed to the function.
>> Index: lib/AsmParser/LLToken.h
>> ===================================================================
>> --- lib/AsmParser/LLToken.h
>> +++ lib/AsmParser/LLToken.h
>> @@ -81,6 +81,7 @@
>> kw_alignstack,
>> kw_inteldialect,
>> kw_gc,
>> + kw_prefix,
>> kw_c,
>>
>> kw_cc, kw_ccc, kw_fastcc, kw_coldcc,
>> Index: lib/Bitcode/Reader/BitcodeReader.cpp
>> ===================================================================
>> --- lib/Bitcode/Reader/BitcodeReader.cpp
>> +++ lib/Bitcode/Reader/BitcodeReader.cpp
>> @@ -975,9 +975,11 @@
>> bool BitcodeReader::ResolveGlobalAndAliasInits() {
>> std::vector<std::pair<GlobalVariable*, unsigned> > GlobalInitWorklist;
>> std::vector<std::pair<GlobalAlias*, unsigned> > AliasInitWorklist;
>> + std::vector<std::pair<Function*, unsigned> > FunctionPrefixWorklist;
>>
>> GlobalInitWorklist.swap(GlobalInits);
>> AliasInitWorklist.swap(AliasInits);
>> + FunctionPrefixWorklist.swap(FunctionPrefixes);
>>
>> while (!GlobalInitWorklist.empty()) {
>> unsigned ValID = GlobalInitWorklist.back().second;
>> @@ -1005,6 +1007,20 @@
>> }
>> AliasInitWorklist.pop_back();
>> }
>> +
>> + while (!FunctionPrefixWorklist.empty()) {
>> + unsigned ValID = FunctionPrefixWorklist.back().second;
>> + if (ValID >= ValueList.size()) {
>> + FunctionPrefixes.push_back(FunctionPrefixWorklist.back());
>> + } else {
>> + if (Constant *C = dyn_cast<Constant>(ValueList[ValID]))
>> + FunctionPrefixWorklist.back().first->setPrefixData(C);
>> + else
>> + return Error("Function prefix is not a constant!");
>> + }
>> + FunctionPrefixWorklist.pop_back();
>> + }
>> +
>> return false;
>> }
>>
>> @@ -1741,6 +1757,8 @@
>> if (Record.size() > 9)
>> UnnamedAddr = Record[9];
>> Func->setUnnamedAddr(UnnamedAddr);
>> + if (Record.size() > 10 && Record[10] != 0)
>> + FunctionPrefixes.push_back(std::make_pair(Func, Record[10]-1));
>> ValueList.push_back(Func);
>>
>> // If this is a function with a body, remember the prototype we are
>> Index: lib/Bitcode/Reader/BitcodeReader.h
>> ===================================================================
>> --- lib/Bitcode/Reader/BitcodeReader.h
>> +++ lib/Bitcode/Reader/BitcodeReader.h
>> @@ -142,6 +142,7 @@
>>
>> std::vector<std::pair<GlobalVariable*, unsigned> > GlobalInits;
>> std::vector<std::pair<GlobalAlias*, unsigned> > AliasInits;
>> + std::vector<std::pair<Function*, unsigned> > FunctionPrefixes;
>>
>> /// MAttributes - The set of attributes by index. Index zero in the
>> /// file is for null, and is thus not represented here. As such all indices
>> Index: lib/Bitcode/Writer/BitcodeWriter.cpp
>> ===================================================================
>> --- lib/Bitcode/Writer/BitcodeWriter.cpp
>> +++ lib/Bitcode/Writer/BitcodeWriter.cpp
>> @@ -550,7 +550,7 @@
>> // Emit the function proto information.
>> for (Module::const_iterator F = M->begin(), E = M->end(); F != E; ++F) {
>> // FUNCTION: [type, callingconv, isproto, linkage, paramattrs, alignment,
>> - // section, visibility, gc, unnamed_addr]
>> + // section, visibility, gc, unnamed_addr, prefix]
>> Vals.push_back(VE.getTypeID(F->getType()));
>> Vals.push_back(F->getCallingConv());
>> Vals.push_back(F->isDeclaration());
>> @@ -561,6 +561,8 @@
>> Vals.push_back(getEncodedVisibility(F));
>> Vals.push_back(F->hasGC() ? GCMap[F->getGC()] : 0);
>> Vals.push_back(F->hasUnnamedAddr());
>> + Vals.push_back(F->hasPrefixData() ? (VE.getValueID(F->getPrefixData()) + 1)
>> + : 0);
>>
>> unsigned AbbrevToUse = 0;
>> Stream.EmitRecord(bitc::MODULE_CODE_FUNCTION, Vals, AbbrevToUse);
>> @@ -1847,6 +1849,8 @@
>> WriteUseList(FI, VE, Stream);
>> if (!FI->isDeclaration())
>> WriteFunctionUseList(FI, VE, Stream);
>> + if (FI->hasPrefixData())
>> + WriteUseList(FI->getPrefixData(), VE, Stream);
>> }
>>
>> // Write the aliases.
>> Index: lib/Bitcode/Writer/ValueEnumerator.cpp
>> ===================================================================
>> --- lib/Bitcode/Writer/ValueEnumerator.cpp
>> +++ lib/Bitcode/Writer/ValueEnumerator.cpp
>> @@ -60,6 +60,11 @@
>> I != E; ++I)
>> EnumerateValue(I->getAliasee());
>>
>> + // Enumerate the prefix data constants.
>> + for (Module::const_iterator I = M->begin(), E = M->end(); I != E; ++I)
>> + if (I->hasPrefixData())
>> + EnumerateValue(I->getPrefixData());
>> +
>> // Insert constants and metadata that are named at module level into the slot
>> // pool so that the module symbol table can refer to them...
>> EnumerateValueSymbolTable(M->getValueSymbolTable());
>> Index: lib/CodeGen/AsmPrinter/AsmPrinter.cpp
>> ===================================================================
>> --- lib/CodeGen/AsmPrinter/AsmPrinter.cpp
>> +++ lib/CodeGen/AsmPrinter/AsmPrinter.cpp
>> @@ -468,6 +468,10 @@
>> OutStreamer.EmitLabel(FakeStub);
>> }
>>
>> + // Emit the prefix data.
>> + if (F->hasPrefixData())
>> + EmitGlobalConstant(F->getPrefixData());
>> +
>> // Emit pre-function debug and/or EH information.
>> if (DE) {
>> NamedRegionTimer T(EHTimerName, DWARFGroupName, TimePassesIsEnabled);
>> Index: lib/IR/AsmWriter.cpp
>> ===================================================================
>> --- lib/IR/AsmWriter.cpp
>> +++ lib/IR/AsmWriter.cpp
>> @@ -1647,6 +1647,10 @@
>> Out << " align " << F->getAlignment();
>> if (F->hasGC())
>> Out << " gc \"" << F->getGC() << '"';
>> + if (F->hasPrefixData()) {
>> + Out << " prefix ";
>> + writeOperand(F->getPrefixData(), true);
>> + }
>> if (F->isDeclaration()) {
>> Out << '\n';
>> } else {
>> Index: lib/IR/Function.cpp
>> ===================================================================
>> --- lib/IR/Function.cpp
>> +++ lib/IR/Function.cpp
>> @@ -195,7 +195,8 @@
>> Function::Function(FunctionType *Ty, LinkageTypes Linkage,
>> const Twine &name, Module *ParentModule)
>> : GlobalValue(PointerType::getUnqual(Ty),
>> - Value::FunctionVal, 0, 0, Linkage, name) {
>> + Value::FunctionVal, OperandTraits<Function>::op_begin(this), 0,
>> + Linkage, name) {
>> assert(FunctionType::isValidReturnType(getReturnType()) &&
>> "invalid return type");
>> SymTab = new ValueSymbolTable();
>> @@ -229,6 +230,8 @@
>> // Remove the intrinsicID from the Cache.
>> if (getValueName() && isIntrinsic())
>> getContext().pImpl->IntrinsicIDCache.erase(this);
>> +
>> + NumOperands = 1; // FIXME: needed by operator delete
>> }
>>
>> void Function::BuildLazyArguments() const {
>> @@ -276,6 +279,8 @@
>> // blockaddresses, but BasicBlock's destructor takes care of those.
>> while (!BasicBlocks.empty())
>> BasicBlocks.begin()->eraseFromParent();
>> +
>> + setPrefixData(0);
>> }
>>
>> void Function::addAttribute(unsigned i, Attribute::AttrKind attr) {
>> @@ -351,6 +356,10 @@
>> setGC(SrcF->getGC());
>> else
>> clearGC();
>> + if (SrcF->hasPrefixData())
>> + setPrefixData(SrcF->getPrefixData());
>> + else
>> + setPrefixData(0);
>> }
>>
>> /// getIntrinsicID - This method returns the ID number of the specified
>> @@ -720,3 +729,11 @@
>>
>> return false;
>> }
>> +
>> +void Function::setPrefixData(Constant *PrefixData) {
>> + if (PrefixData)
>> + NumOperands = 1;
>> + Op<0>() = PrefixData;
>> + if (!PrefixData)
>> + NumOperands = 0;
>> +}
>> Index: lib/IR/TypeFinder.cpp
>> ===================================================================
>> --- lib/IR/TypeFinder.cpp
>> +++ lib/IR/TypeFinder.cpp
>> @@ -44,6 +44,9 @@
>> for (Module::const_iterator FI = M.begin(), E = M.end(); FI != E; ++FI) {
>> incorporateType(FI->getType());
>>
>> + if (FI->hasPrefixData())
>> + incorporateValue(FI->getPrefixData());
>> +
>> // First incorporate the arguments.
>> for (Function::const_arg_iterator AI = FI->arg_begin(),
>> AE = FI->arg_end(); AI != AE; ++AI)
>> Index: lib/Transforms/IPO/GlobalDCE.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/GlobalDCE.cpp
>> +++ lib/Transforms/IPO/GlobalDCE.cpp
>> @@ -179,6 +179,9 @@
>> // any globals used will be marked as needed.
>> Function *F = cast<Function>(G);
>>
>> + if (F->hasPrefixData())
>> + MarkUsedGlobalsAsNeeded(F->getPrefixData());
>> +
>> for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>> for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>> for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
>> Index: test/CodeGen/X86/prefixdata.ll
>> ===================================================================
>> --- /dev/null
>> +++ test/CodeGen/X86/prefixdata.ll
>> @@ -0,0 +1,15 @@
>> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
>> +
>> + at i = linkonce_odr global i32 1
>> +
>> +; CHECK: f:
>> +; CHECK-NEXT: .long 1
>> +define void @f() prefix i32 1 {
>> + ret void
>> +}
>> +
>> +; CHECK: g:
>> +; CHECK-NEXT: .quad i
>> +define void @g() prefix i32* @i {
>> + ret void
>> +}
>> Index: test/Feature/prefixdata.ll
>> ===================================================================
>> --- /dev/null
>> +++ test/Feature/prefixdata.ll
>> @@ -0,0 +1,18 @@
>> +; RUN: llvm-as < %s | llvm-dis > %t1.ll
>> +; RUN: FileCheck %s < %t1.ll
>> +; RUN: llvm-as < %t1.ll | llvm-dis > %t2.ll
>> +; RUN: diff %t1.ll %t2.ll
>> +; RUN: opt -O3 -S < %t1.ll | FileCheck %s
>> +
>> +; CHECK: @i
>> + at i = linkonce_odr global i32 1
>> +
>> +; CHECK: f(){{.*}}prefix i32 1
>> +define void @f() prefix i32 1 {
>> + ret void
>> +}
>> +
>> +; CHECK: g(){{.*}}prefix i32* @i
>> +define void @g() prefix i32* @i {
>> + ret void
>> +}
>
>
More information about the llvm-commits
mailing list