[PATCH] D97128: [ThinLTO] Make cloneUsedGlobalVariables deterministic

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 22 10:07:02 PST 2021


MaskRay added inline comments.


================
Comment at: llvm/lib/IR/Module.cpp:662
+GlobalVariable *llvm::collectUsedGlobalVariables(
+    const Module &M, SmallVectorImpl<GlobalValue *> &Set, bool CompilerUsed) {
+  const char *Name = CompilerUsed ? "llvm.compiler.used" : "llvm.used";
----------------
tejohnson wrote:
> MaskRay wrote:
> > MaskRay wrote:
> > > tejohnson wrote:
> > > > Instead of having a second very similar interface, which could get confusing, how about just change it to use a set with a deterministic iterator, like std::set? Or if it is just that one callsite that needs the deterministic ordering, perhaps just copy the SmallPtrSet into a std::set there. Otherwise, the interfaces need to be carefully documented to avoid confusion. I'd lean towards just coping into a std::set in this instance for simplicity. WDYT?
> > > We probably should delete the SmallPtrSet overload. I've checked several other call sites - The call sites don't seem to benefit from a SmallPtrSet as well.  Some call sites actually have similar non-deterministic issues. (I have confirmed that if I add a `std::reverse` I can get different bitcode files.) 
> > > 
> > > Using `std::set` will require a comparator on name. Let's use SmallVectorImpl here and migrate SmallPtrSet separately?
> > Hmm. Some call sites benefit from a SmallPtrSet (`count`), while some are better off with `SmallVectorImpl`. It looks to me that both overloads are useful.
> That's true about needing to ensure std::set sorted by name not pointer value. I saw you sent a follow on patch to fix LowerTypeTests. I think there are other places that would have the same issue (e.g. in the BitcodeWriter, in EmbedBitcodeInModule - and that was just the first one I looked at).
> 
> My concern is that with 2 similar interfaces it will be confusing and error prone which one gets used.
> 
> How about using a SetVector? From https://llvm.org/docs/ProgrammersManual.html#llvm-adt-setvector-h:
> "The difference between SetVector and other sets is that the order of iteration is guaranteed to match the order of insertion into the SetVector. This property is really important for things like sets of pointers."
> 
A SetVector is by default a composition of a SmallVector and a SmallDenseSet. Its space consumption is the two data structures combined.

Another idea is `collectUsedGlobalVariables(const Module &M, function_ref<void(GlobalValue*)>, bool CompilerUsed)`: let the call site pass in a callback. In every call site we'll do `[&](GlobalValue *X) { Set.insert(X); }` or `push_back`. That would make every call site use a bit more code.

How about adding `collectUsedGlobalVariablesImpl` for function_ref and still keep the SmallVectorImpl+SmallPtrSet interfaces. The internal implementation is not duplicated.

In terms of interface confusion... I think this scheme is no more confusing than the template solution (moving the Module.cpp implementation to Module.h and making the function a template).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97128/new/

https://reviews.llvm.org/D97128



More information about the llvm-commits mailing list