[PATCH] D12132: TransformUtils: Introduce module splitter.

Tue Aug 18 22:52:21 PDT 2015

joker.eph added a comment.

Hi Peter,

I wonder if we can't make the split really more efficient, especially memory wise. Ideally we should seek to:

- Limit looping over the module content
- Move content instead of duplicating it.

This is especially important since LTO deals with very large module. 
So reusing the cloneModule() implementation is clever and nicely done, but on the long term I'm not sure it is the way to go. I can imagine many speed/memory improvements that could be done by splitting a Module in N partitions in one pass.
So I wonder: if we want to really productize this, I'm not sure that the current code sharing/intrusion in cloneModule() is the right way to go. What do you think?

If you still want to move forward with this patch as is for now, you may find some thoughts about the patch inline.


================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:32
@@ +31,3 @@
+/// This function returns once all ModuleCallback threads have finished
+/// execution. M is in an indeterminate state after this function is called.
+void SplitModule(Module *M, unsigned N,
----------------
What about adding a warning/todo/fixme about "some subtle symbol visibility issues" that have to be fixed?
(I notices you mention it'll be fixed //later//)

================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:34
@@ +33,3 @@
+void SplitModule(Module *M, unsigned N,
+                 std::function<void(unsigned I, StringRef BC)> ModuleCallback);
+
----------------
A function that "Split a Module in N partition" is very valuable. I'm not necessarily convinced that the interface doesn't intermix two different things:

- Splitting the module
- What to do with the partitions (including issuing threads, etc.)

What about providing the API that only cares about the actual split here?
(I understand that it does not provide some pipeline where the first thread starts before the other partitions are created, but still)

```
std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N);

std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N, std::function<unsigned(GlobalValue &GV)> PartitionSelection);
```

Where the callback allows the caller to control the partition logic.
The first one could call the second one with your current modulo-hash partition method as a callback.

Also if the Module is left in an undetermined state, the caller won't be able to do anything with it but destroying it, so we might better take ownership of it anyway. It might be possible to re-use it for partition 0 and save one clone?

================
Comment at: lib/Transforms/Utils/CloneModule.cpp:41
@@ -36,1 +40,3 @@
+llvm::CloneModule(const Module *M, ValueToValueMapTy &VMap,
+                  std::function<bool(const GlobalValue *)> CloneDefinition) {
   // First off, we need to create the new module.
----------------
I would found the callback call sites more readable with a name like "ShouldCloneDefinition", because "CloneDefinition" makes me think it will actually Clone something.
(just an opinion, feel free to do whatever seems best for you)

================
Comment at: lib/Transforms/Utils/CloneModule.cpp:94
@@ +93,3 @@
+      // kinds of globals is forbidden), but this is generally not required for
+      // correctness.
+      continue;
----------------
Function attributes are not required for correctness? I'm unsure about that.

================
Comment at: lib/Transforms/Utils/SplitModule.cpp:37
@@ +36,3 @@
+  if (!GV->hasName())
+    GV->setName("__llvmlto_unnamed");
+}
----------------
Minor comment: the file is //lib/Transforms/Utils/SplitModule.cpp// but the name includes "lto" 
Some context is leaking ;-)

================
Comment at: tools/llvm-split/llvm-split.cpp:51
@@ +50,3 @@
+  std::string OutName = OutputFilename;
+  SplitModule(M.get(), NumOutputs, [=](unsigned I, StringRef BC) {
+    std::error_code EC;
----------------
Not sure why the lambda needs to capture by copy?


http://reviews.llvm.org/D12132