[PATCH] D13107: Support for function summary index bitcode sections and files. Includes bitcode reader/writer support for these sections, and support for creation of combined index files.
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 3 08:20:34 PDT 2015
On Wed, Sep 30, 2015 at 10:33 PM, Mehdi AMINI <mehdi.amini at apple.com> wrote:
> joker.eph added a comment.
>
> Hi Teresa,
>
> Please find some inline comments below. I had some minor naming suggestion, but more important is the ownership of some object instantiation.
Thanks for the thorough review! I think I have addressed all of your
suggestions and concerns. Responses below. New patch being uploaded
shortly.
>
>
> ================
> Comment at: include/llvm/Bitcode/BitcodeWriterPass.h:54
> @@ -51,1 +53,3 @@
> + : OS(OS), ShouldPreserveUseListOrder(ShouldPreserveUseListOrder),
> + EmitThinLTOIndex(EmitThinLTOIndex) {}
>
> ----------------
> Rename //ThinLTO// with //FunctionSummary// to be coherent with the rest like //hasFunctionSummary()// for instance?
>
> I think there are multiple places (not to say everywhere) where //ThinLTO// could be replaced with something more generic like //FunctionSummary//.
Yeah, I had only left ThinLTO in a few contexts, mostly this one that
is set under -flto=thin. But I agree the flag doesn't
need to be ThinLTO-specific, and I have renamed it and the associated
llvm-as option accordingly.
The only places that now specifically mention ThinLTO are in some
comments that indicate how/why we are using the function summary
currently, and for the gold plugin option, since the behavior it
provokes is ThinLTO specific.
>
> ================
> Comment at: include/llvm/Bitcode/BitstreamWriter.h:24
> @@ -23,2 +23,3 @@
> #include "llvm/Support/Endian.h"
> +#include "llvm/Support/raw_ostream.h"
> #include <vector>
> ----------------
> Why?
Leftover from debugging, removed.
>
> ================
> Comment at: include/llvm/Bitcode/BitstreamWriter.h:528
> @@ -523,1 +527,3 @@
> +public:
> +
> BlockInfo &getOrCreateBlockInfo(unsigned BlockID) {
> ----------------
> Shouldn't be needed?
Removed this change.
>
> ================
> Comment at: include/llvm/Bitcode/ReaderWriter.h:74
> @@ +73,3 @@
> + DiagnosticHandlerFunction DiagnosticHandler,
> + bool ParseFuncSummary = true);
> +
> ----------------
> Might be more explicit to rename //ParseFuncSummary// into //IsLazy//.
> (just a suggestion, do whatever seems better to you)
I like that suggestion, changed.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:45
> @@ +44,3 @@
> + /// have module identifier appended before placing into the combined
> + /// index, to disambiguiate from other functions with the same name.
> + ///
> ----------------
> s/disambiguiate/disambiguate/ ?
Fixed
>
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:99
> @@ +98,3 @@
> +/// of the corresponding function block. For the combined index,
> +/// it, after parsing of the \a ValueSymbolTable, this initially
> +/// holds the offset of the corresponding function summary bitcode
> ----------------
> The "it" at the beginning of the line seems strange to me?
Fixed.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:107
> @@ +106,3 @@
> + /// decisions.
> + FunctionSummary *Summary;
> + /// \brief The bitcode offset corresponding to either the associated
> ----------------
> Who owns it?
> (I may found out later)
Fixed, see below.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:108
> @@ +107,3 @@
> + FunctionSummary *Summary;
> + /// \brief The bitcode offset corresponding to either the associated
> + /// function's function body record, or its function summary record,
> ----------------
> (Missing empty line)
Fixed.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:130
> @@ +129,3 @@
> + FunctionInfo(uint64_t FuncOffset,
> + FunctionSummary *FuncSummary) :
> + Summary(FuncSummary),
> ----------------
> If this takes ownership, use a `unique_ptr` for the argument.
Good idea, done.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:136
> @@ +135,3 @@
> + if (Summary)
> + delete Summary;
> + }
> ----------------
> Any reason not to use `unique_ptr` for the Summary member?
Done. Ditto for the FunctionInfo entries in the FunctionInfoList.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:143
> @@ +142,3 @@
> + // destruction doesn't perform a double-delete.
> + Summary = nullptr;
> + BitcodeIndex = Other.BitcodeIndex;
> ----------------
> This sounds suspicious: a copy ctor that does not really copy...
> I have to see the use case first, I'm skeptical right now :)
This is gone now that I have made Summary a unique_ptr and the
combined index takes over ownership on the merge.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:166
> @@ +165,3 @@
> + /// the index type.
> + void bitcodeIndex(uint64_t FuncOffset) {
> + BitcodeIndex = FuncOffset;
> ----------------
> The other setters name are beginning with //set//, why not this one?
Fixed.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:190
> @@ +189,3 @@
> +class FunctionInfoIndex {
> + public:
> +
> ----------------
> remove
Fixed.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:213
> @@ +212,3 @@
> + FMI->getValue().begin(); FLI != FMI->getValue().end(); ++FLI) {
> + delete *FLI;
> + }
> ----------------
> Why not using `unique_ptr` and remote the destructor?
Done.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:223
> @@ +222,3 @@
> +
> + const_funcinfo_iterator funcinfo_begin() const {
> + return FunctionMap.begin();
> ----------------
> If it was called "begin()" you could use for-range loop. An alternative would be to add another method `functions()` that would return an `llvm::iterator_range`.
Changed to begin() and end() and updated all uses to for-range loops.
>
> ================
> Comment at: include/llvm/IR/FunctionInfo.h:232
> @@ +231,3 @@
> + FunctionInfoList getFunctionInfoList(StringRef FuncName) const {
> + return FunctionMap.lookup(FuncName);
> + }
> ----------------
> Do you really intend to return a copy of the vector all the time?
Nope, fixed to return a const reference, which entailed changing
".lookup(FuncName)" to "[FuncName]"
>
> ================
> Comment at: include/llvm/Object/FunctionIndexObjectFile.h:84
> @@ +83,3 @@
> + static bool hasFunctionSummaryInMemBuffer(MemoryBufferRef Object,
> + LLVMContext &Context);
> +
> ----------------
> indentation
Fixed.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:410
> @@ +409,3 @@
> +
> + /// \brief Used to indicate whether we are doing lazy parsing of summary data.
> + ///
> ----------------
> What about a variable name that contains `Lazy`?
Fixed as described earlier.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:437
> @@ +436,3 @@
> + /// summary record offset).
> + DenseMap<uint64_t, FunctionSummary*> SummaryMap;
> +
> ----------------
> The ownership/lifetime of FunctionSummary is not clear here?
Changed to unique_ptr.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:3110
> @@ -3016,2 +3109,3 @@
> }
> + //assert(false); // Temp assert to make sure we don't fall back to old code.
> break;
> ----------------
> remove
Fixed.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5215
> @@ +5214,3 @@
> + if (Entry.ID == bitc::FUNCTION_SUMMARY_BLOCK_ID)
> + SeenFuncSummary = true;
> + if (Stream.SkipBlock())
> ----------------
> Should you just return now?
Good point, done.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5241
> @@ +5240,3 @@
> + }
> + else if (std::error_code EC = parseEntireSummary())
> + return EC;
> ----------------
> I would expect clang-format to put the else on the same line as the `{`?
It did - looks like I hadn't run clang-format on this yet. I did and
it made quite a lot of changes that will be in the updated patch.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5313
> @@ +5312,3 @@
> + }
> + llvm_unreachable("Exit infinite loop");
> +}
> ----------------
> I don't see the expected return for this function?
I assumed that wasn't needed with the llvm_unreachable here (this is
cloned from a number of other BitcodeReader routines that behave the
same way). It should have returned earlier when the EndBlock was
found.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5343
> @@ +5342,3 @@
> + break;
> + case bitc::MST_CODE_ENTRY: {
> + // MST_ENTRY: [modid, namechar x N]
> ----------------
> The indentation seems incorrect, is it clang-formatted?
Fixed.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5382
> @@ +5381,3 @@
> + // We didn't really read a proper Module block.
> + return error("Malformed ThihLTO IR file");
> + }
> ----------------
> Typo with ThihLTO.
> (also I'm still not sure at which level where the "ThinLTO" concept should "leak").
I changed this to "Malformed block" which is consistent with the
messaging in many other locations in this file.
>
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5684
> @@ +5683,3 @@
> + std::unique_ptr<MemoryBuffer> Buf = MemoryBuffer::getMemBuffer(Buffer, false);
> + FunctionIndexBitcodeReader *R =
> + new FunctionIndexBitcodeReader(Buf.get(), Context, DiagnosticHandler);
> ----------------
> When is this memory released?
Good point. The regular BitcodeReader pointers that are created get
assigned to a unique_ptr in the Module and thus released when the
Module is destroyed. Since the lifetime of the
FunctionIndexBitcodeReader object in all 3 routines that create it is
local to that function, I made R a unique_ptr so it will get released
at the end of the routine.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2304
> @@ +2303,3 @@
> + FII->getValue().begin(); FLI != FII->getValue().end(); ++FLI) {
> + FunctionInfo *FI = *FLI;
> +
> ----------------
> For-range loops?
Done here and in other similar places.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2323
> @@ +2322,3 @@
> + *E = FuncName.data()+FuncName.size(); P != E; ++P)
> + NameVals.push_back((unsigned char)*P);
> +
> ----------------
> `for(auto P : FuncName)` ?
Done here and 2 similar cases.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2384
> @@ +2383,3 @@
> +
> + FunctionIndex[&F] = FuncInfo;
> +}
> ----------------
> Ownership is not clear, what about having a `DenseMap<const Function *, std::unique_ptr<FunctionInfo>>` instead?
Done.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2693
> @@ +2692,3 @@
> + for (const char *P = MPSE.getKey().data(),
> + *E = MPSE.getKey().data()+MPSE.getKey().size(); P != E; ++P)
> + NameVals.push_back((unsigned char)*P);
> ----------------
> `for (auto P : MPSE.getKey())` ?
Done.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2720
> @@ +2719,3 @@
> + for (DenseMap<const Function *, FunctionInfo *>::iterator I =
> + FunctionIndex.begin(); I != FunctionIndex.end(); I++) {
> + // Skip anonymous functions. We will emit a function summary for
> ----------------
> For-range loop?
Done.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2757
> @@ +2756,3 @@
> + NameVals.clear();
> + }
> +
> ----------------
> Some refactoring seems possible between these two loops.
I pulled out the common code into a new helper function.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2780
> @@ +2779,3 @@
> + for (std::vector<FunctionInfo*>::const_iterator FLI =
> + FII->getValue().begin(); FLI != FII->getValue().end(); ++FLI) {
> + FunctionInfo *FI = *FLI;
> ----------------
> for-range loop?
Done.
>
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2994
> @@ +2993,3 @@
> + Stream.Emit(0xE, 4);
> + Stream.Emit(0xD, 4);
> +
> ----------------
> (Side note) I read this so many times, I'm surprised there is not a better refactoring for reading/writing bitcode.
I went ahead and pulled the header writing/reading code out into a
helper both here and in the reader.
>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:19
> @@ +18,3 @@
> +
> +uint64_t FunctionInfoIndex::NextModuleId = 0;
> +
> ----------------
> A mutable global is really to be avoided. I am really not convinced that it is desirable here.
> Should it be passed by argument to the `mergeFrom` method? Or a member variable?
>
> Maybe even `mergeFrom()` should not be a member function, but a separate object that would have a `NextModuleId` and you could add `FunctionInfoIndex` and get the merged result.
For now since this is only called when the combined module is created
from the gold plugin, I removed the
static member variable and made it a local that is incremented and
passed in from the caller of mergeFunc().
An existing assert in mergeFunc() ensures that module's have unique
IDs assigned to them.
>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:23
> @@ +22,3 @@
> +// per-module instances.
> +void FunctionInfoIndex::mergeFrom(FunctionInfoIndex *Other) {
> +
> ----------------
> What about taking a `unique_ptr` here and "stealing" its content instead of doing a bunch of redundant memory allocation and copy?
Done.
>
> ================
> Comment at: lib/IR/FunctionInfo.cpp:29
> @@ +28,3 @@
> + StringRef ModPath;
> + for (; OtherFuncInfoLists != Other->funcinfo_end(); OtherFuncInfoLists++) {
> + StringRef FuncName = OtherFuncInfoLists->getKey();
> ----------------
> This loop header is strange with the iterator declared outside. Turn into a for-range loop?
Done.
>
> ================
> Comment at: lib/Object/FunctionIndexObjectFile.cpp:89
> @@ +88,3 @@
> +ErrorOr<std::unique_ptr<FunctionIndexObjectFile>>
> +llvm::object::FunctionIndexObjectFile::create(MemoryBufferRef Object,
> + LLVMContext &Context,
> ----------------
> I'm not sure why you would need to provide the fully qualified name for this function?
Hmm, blindly cloned this from "llvm::object::IRObjectFile::create(". Removed it.
>
> ================
> Comment at: test/Bitcode/thinlto-function-summary.ll:17
> @@ +16,3 @@
> +
> +; CHECK: define i32 @foo()
> +
> ----------------
> This line is ignored
Why? It should be checked by the second "RUN".
>
> ================
> Comment at: test/Bitcode/thinlto-function-summary.ll:25
> @@ +24,3 @@
> +
> +; CHECK: define i32 @bar(i32 %x)
> +
> ----------------
> This line is ignored as well
Same here. I confirmed that if I add garbage to the end of this line it fails.
>
> ================
> Comment at: test/tools/gold/X86/thinlto.ll:8
> @@ +7,3 @@
> +; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc | FileCheck %s --check-prefix=COMBINED
> +; RUN: not test -a %t3
> +
> ----------------
> What is this line doing? Isn't the `-a` for the `&&` operator?
> Do you check that the file %t3 is not generated by gold and only the combine index is?
Right, I am trying to make sure that %t3 is not actually generated
when we tell gold to create the combined index. I copied this from
test/tools/gold/X86/emit-llvm.ll and after looking at the man page for
'test' I am not sure why it uses -a. I've changed it to use -e for
thinlto.ll, which works as expected (-a works as well, and this fails
when I remove the "not", but doesn't make a lot of sense).
>
>
> http://reviews.llvm.org/D13107
>
>
>
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
More information about the llvm-commits
mailing list