[PATCH] D13107: Support for function summary index bitcode sections and files. Includes bitcode reader/writer support for these sections, and support for creation of combined index files.

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Sat Oct 3 08:20:34 PDT 2015


On Wed, Sep 30, 2015 at 10:33 PM, Mehdi AMINI <mehdi.amini at apple.com> wrote:
> joker.eph added a comment.
>
> Hi Teresa,
>
> Please find some inline comments below. I had some minor naming suggestion, but more important is the ownership of some object instantiation.

Thanks for the thorough review! I think I have addressed all of your
suggestions and concerns. Responses below. New patch being uploaded
shortly.

>
>
> ================
> Comment at: include/llvm/Bitcode/BitcodeWriterPass.h:54
> @@ -51,1 +53,3 @@
> +      : OS(OS), ShouldPreserveUseListOrder(ShouldPreserveUseListOrder),
> +        EmitThinLTOIndex(EmitThinLTOIndex) {}
>
> ----------------
> Rename //ThinLTO// with //FunctionSummary// to be coherent with the rest like //hasFunctionSummary()// for instance?
>
> I think there are multiple places (not to say everywhere) where //ThinLTO// could be replaced with something more generic like //FunctionSummary//.

Yeah, I had only left ThinLTO in a few contexts, mostly this one that
is set under -flto=thin. But I agree the flag doesn't
need to be ThinLTO-specific, and I have renamed it and the associated
llvm-as option accordingly.

The only places that now specifically mention ThinLTO are in some
comments that indicate how/why we are using the function summary
currently, and for the gold plugin option, since the behavior it
provokes is ThinLTO specific.

>
> ================
> Comment at: include/llvm/Bitcode/BitstreamWriter.h:24
> @@ -23,2 +23,3 @@
>  #include "llvm/Support/Endian.h"
> +#include "llvm/Support/raw_ostream.h"
>  #include <vector>
> ----------------
> Why?

Leftover from debugging, removed.

>
> ================
> Comment at: include/llvm/Bitcode/BitstreamWriter.h:528
> @@ -523,1 +527,3 @@
> +public:
> +
>    BlockInfo &getOrCreateBlockInfo(unsigned BlockID) {
> ----------------
> Shouldn't be needed?

Removed this change.

>
> ================
> Comment at: include/llvm/Bitcode/ReaderWriter.h:74
> @@ +73,3 @@
> +      DiagnosticHandlerFunction DiagnosticHandler,
> +      bool ParseFuncSummary = true);
> +
> ----------------
> Might be more explicit to rename //ParseFuncSummary// into //IsLazy//.
> (just a suggestion, do whatever seems better to you)

I like that suggestion, changed.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:45
> @@ +44,3 @@
> +  /// have module identifier appended before placing into the combined
> +  /// index, to disambiguiate from other functions with the same name.
> +  ///
> ----------------
> s/disambiguiate/disambiguate/ ?

Fixed

>
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:99
> @@ +98,3 @@
> +/// of the corresponding function block. For the combined index,
> +/// it, after parsing of the \a ValueSymbolTable, this initially
> +/// holds the offset of the corresponding function summary bitcode
> ----------------
> The "it" at the beginning of the line seems strange to me?

Fixed.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:107
> @@ +106,3 @@
> +  /// decisions.
> +  FunctionSummary *Summary;
> +  /// \brief The bitcode offset corresponding to either the associated
> ----------------
> Who owns it?
> (I may found out later)

Fixed, see below.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:108
> @@ +107,3 @@
> +  FunctionSummary *Summary;
> +  /// \brief The bitcode offset corresponding to either the associated
> +  /// function's function body record, or its function summary record,
> ----------------
> (Missing empty line)

Fixed.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:130
> @@ +129,3 @@
> +  FunctionInfo(uint64_t FuncOffset,
> +               FunctionSummary *FuncSummary) :
> +    Summary(FuncSummary),
> ----------------
> If this takes ownership, use a `unique_ptr` for the argument.

Good idea, done.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:136
> @@ +135,3 @@
> +    if (Summary)
> +      delete Summary;
> +  }
> ----------------
> Any reason not to use `unique_ptr` for the Summary member?

Done. Ditto for the FunctionInfo entries in the FunctionInfoList.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:143
> @@ +142,3 @@
> +    // destruction doesn't perform a double-delete.
> +    Summary = nullptr;
> +    BitcodeIndex = Other.BitcodeIndex;
> ----------------
> This sounds suspicious: a copy ctor that does not really copy...
> I have to see the use case first, I'm skeptical right now :)

This is gone now that I have made Summary a unique_ptr and the
combined index takes over ownership on the merge.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:166
> @@ +165,3 @@
> +  /// the index type.
> +  void bitcodeIndex(uint64_t FuncOffset) {
> +    BitcodeIndex = FuncOffset;
> ----------------
> The other setters name are beginning with //set//, why not this one?

Fixed.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:190
> @@ +189,3 @@
> +class FunctionInfoIndex {
> + public:
> +
> ----------------
> remove

Fixed.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:213
> @@ +212,3 @@
> +           FMI->getValue().begin(); FLI != FMI->getValue().end(); ++FLI) {
> +        delete *FLI;
> +      }
> ----------------
> Why not using `unique_ptr` and remote the destructor?

Done.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:223
> @@ +222,3 @@
> +
> +  const_funcinfo_iterator funcinfo_begin() const {
> +    return FunctionMap.begin();
> ----------------
> If it was called "begin()" you could use for-range loop. An alternative would be to add another method `functions()` that would return an `llvm::iterator_range`.

Changed to begin() and end() and updated all uses to for-range loops.

>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:232
> @@ +231,3 @@
> +  FunctionInfoList getFunctionInfoList(StringRef FuncName) const {
> +    return FunctionMap.lookup(FuncName);
> +  }
> ----------------
> Do you really intend to return a copy of the vector all the time?

Nope, fixed to return a const reference, which entailed changing
".lookup(FuncName)" to "[FuncName]"

>
> ================
> Comment at: include/llvm/Object/FunctionIndexObjectFile.h:84
> @@ +83,3 @@
> +  static bool hasFunctionSummaryInMemBuffer(MemoryBufferRef Object,
> +                                          LLVMContext &Context);
> +
> ----------------
> indentation

Fixed.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:410
> @@ +409,3 @@
> +
> +  /// \brief Used to indicate whether we are doing lazy parsing of summary data.
> +  ///
> ----------------
> What about a variable name that contains `Lazy`?

Fixed as described earlier.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:437
> @@ +436,3 @@
> +  /// summary record offset).
> +  DenseMap<uint64_t, FunctionSummary*> SummaryMap;
> +
> ----------------
> The ownership/lifetime of FunctionSummary is not clear here?

Changed to unique_ptr.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:3110
> @@ -3016,2 +3109,3 @@
>          }
> +        //assert(false); // Temp assert to make sure we don't fall back to old code.
>          break;
> ----------------
> remove

Fixed.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5215
> @@ +5214,3 @@
> +        if (Entry.ID == bitc::FUNCTION_SUMMARY_BLOCK_ID)
> +          SeenFuncSummary = true;
> +        if (Stream.SkipBlock())
> ----------------
> Should you just return now?

Good point, done.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5241
> @@ +5240,3 @@
> +        }
> +        else if (std::error_code EC = parseEntireSummary())
> +          return EC;
> ----------------
> I would expect clang-format to put the else on the same line as the `{`?

It did - looks like I hadn't run clang-format on this yet. I did and
it made quite a lot of changes that will be in the updated patch.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5313
> @@ +5312,3 @@
> +  }
> +  llvm_unreachable("Exit infinite loop");
> +}
> ----------------
> I don't see the expected return for this function?

I assumed that wasn't needed with the llvm_unreachable here (this is
cloned from a number of other BitcodeReader routines that behave the
same way). It should have returned earlier when the EndBlock was
found.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5343
> @@ +5342,3 @@
> +      break;
> +      case bitc::MST_CODE_ENTRY: {
> +        // MST_ENTRY: [modid, namechar x N]
> ----------------
> The indentation seems incorrect, is it clang-formatted?

Fixed.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5382
> @@ +5381,3 @@
> +      // We didn't really read a proper Module block.
> +      return error("Malformed ThihLTO IR file");
> +    }
> ----------------
> Typo with ThihLTO.
> (also I'm still not sure at which level where the "ThinLTO" concept should "leak").

I changed this to "Malformed block" which is consistent with the
messaging in many other locations in this file.

>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5684
> @@ +5683,3 @@
> +  std::unique_ptr<MemoryBuffer> Buf = MemoryBuffer::getMemBuffer(Buffer, false);
> +  FunctionIndexBitcodeReader *R =
> +      new FunctionIndexBitcodeReader(Buf.get(), Context, DiagnosticHandler);
> ----------------
> When is this memory released?

Good point. The regular BitcodeReader pointers that are created get
assigned to a unique_ptr in the Module and thus released when the
Module is destroyed. Since the lifetime of the
FunctionIndexBitcodeReader object in all 3 routines that create it is
local to that function, I made R a unique_ptr so it will get released
at the end of the routine.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2304
> @@ +2303,3 @@
> +         FII->getValue().begin(); FLI != FII->getValue().end(); ++FLI) {
> +      FunctionInfo *FI = *FLI;
> +
> ----------------
> For-range loops?

Done here and in other similar places.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2323
> @@ +2322,3 @@
> +           *E = FuncName.data()+FuncName.size(); P != E; ++P)
> +        NameVals.push_back((unsigned char)*P);
> +
> ----------------
> `for(auto P : FuncName)` ?

Done here and 2 similar cases.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2384
> @@ +2383,3 @@
> +
> +  FunctionIndex[&F] = FuncInfo;
> +}
> ----------------
> Ownership is not clear, what about having a `DenseMap<const Function *, std::unique_ptr<FunctionInfo>>` instead?

Done.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2693
> @@ +2692,3 @@
> +    for (const char *P = MPSE.getKey().data(),
> +         *E = MPSE.getKey().data()+MPSE.getKey().size(); P != E; ++P)
> +      NameVals.push_back((unsigned char)*P);
> ----------------
> `for (auto P : MPSE.getKey())` ?

Done.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2720
> @@ +2719,3 @@
> +  for (DenseMap<const Function *, FunctionInfo *>::iterator I =
> +       FunctionIndex.begin(); I != FunctionIndex.end(); I++) {
> +    // Skip anonymous functions. We will emit a function summary for
> ----------------
> For-range loop?

Done.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2757
> @@ +2756,3 @@
> +    NameVals.clear();
> +  }
> +
> ----------------
> Some refactoring seems possible between these two loops.

I pulled out the common code into a new helper function.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2780
> @@ +2779,3 @@
> +    for (std::vector<FunctionInfo*>::const_iterator FLI =
> +         FII->getValue().begin(); FLI != FII->getValue().end(); ++FLI) {
> +      FunctionInfo *FI = *FLI;
> ----------------
> for-range loop?

Done.

>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2994
> @@ +2993,3 @@
> +  Stream.Emit(0xE, 4);
> +  Stream.Emit(0xD, 4);
> +
> ----------------
> (Side note) I read this so many times, I'm surprised there is not a better refactoring for reading/writing bitcode.

I went ahead and pulled the header writing/reading code out into a
helper both here and in the reader.

>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:19
> @@ +18,3 @@
> +
> +uint64_t FunctionInfoIndex::NextModuleId = 0;
> +
> ----------------
> A mutable global is really to be avoided. I am really not convinced that it is desirable here.
> Should it be passed by argument to the `mergeFrom` method? Or a member variable?
>
> Maybe even `mergeFrom()` should not be a member function, but a separate object that would have a `NextModuleId` and you could add `FunctionInfoIndex` and get the merged result.

For now since this is only called when the combined module is created
from the gold plugin, I removed the
static member variable and made it a local that is incremented and
passed in from the caller of mergeFunc().
An existing assert in mergeFunc() ensures that module's have unique
IDs assigned to them.

>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:23
> @@ +22,3 @@
> +// per-module instances.
> +void FunctionInfoIndex::mergeFrom(FunctionInfoIndex *Other) {
> +
> ----------------
> What about taking a `unique_ptr` here and "stealing" its content instead of doing a bunch of redundant memory allocation and copy?

Done.

>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:29
> @@ +28,3 @@
> +  StringRef ModPath;
> +  for (; OtherFuncInfoLists != Other->funcinfo_end(); OtherFuncInfoLists++) {
> +    StringRef FuncName = OtherFuncInfoLists->getKey();
> ----------------
> This loop header is strange with the iterator declared outside. Turn into a for-range loop?

Done.

>
> ================
> Comment at: lib/Object/FunctionIndexObjectFile.cpp:89
> @@ +88,3 @@
> +ErrorOr<std::unique_ptr<FunctionIndexObjectFile>>
> +llvm::object::FunctionIndexObjectFile::create(MemoryBufferRef Object,
> +                                              LLVMContext &Context,
> ----------------
> I'm not sure why you would need to provide the fully qualified name for this function?

Hmm, blindly cloned this from "llvm::object::IRObjectFile::create(". Removed it.

>
> ================
> Comment at: test/Bitcode/thinlto-function-summary.ll:17
> @@ +16,3 @@
> +
> +; CHECK: define i32 @foo()
> +
> ----------------
> This line is ignored

Why? It should be checked by the second "RUN".

>
> ================
> Comment at: test/Bitcode/thinlto-function-summary.ll:25
> @@ +24,3 @@
> +
> +; CHECK: define i32 @bar(i32 %x)
> +
> ----------------
> This line is ignored as well

Same here. I confirmed that if I add garbage to the end of this line it fails.

>
> ================
> Comment at: test/tools/gold/X86/thinlto.ll:8
> @@ +7,3 @@
> +; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc | FileCheck %s --check-prefix=COMBINED
> +; RUN: not test -a %t3
> +
> ----------------
> What is this line doing? Isn't the `-a` for the `&&` operator?
> Do you check that the file %t3 is not generated by gold and only the combine index is?

Right, I am trying to make sure that %t3 is not actually generated
when we tell gold to create the combined index. I copied this from
test/tools/gold/X86/emit-llvm.ll and after looking at the man page for
'test' I am not sure why it uses -a. I've changed it to use -e for
thinlto.ll, which works as expected (-a works as well, and this fails
when I remove the "not", but doesn't make a lot of sense).

>
>
> http://reviews.llvm.org/D13107
>
>
>



-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413


More information about the llvm-commits mailing list