[PATCH] D12132: TransformUtils: Introduce module splitter.
Peter Collingbourne via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 20 12:14:40 PDT 2015
pcc added a comment.
In http://reviews.llvm.org/D12132#227433, @joker.eph wrote:
> Hi Peter,
>
> I wonder if we can't make the split really more efficient, especially memory wise. Ideally we should seek to:
>
> - Limit looping over the module content
> - Move content instead of duplicating it.
>
> This is especially important since LTO deals with very large module. So reusing the cloneModule() implementation is clever and nicely done, but on the long term I'm not sure it is the way to go. I can imagine many speed/memory improvements that could be done by splitting a Module in N partitions in one pass.
I agree with all of this, and I'd be interested in exploring any ideas around making this more efficient, but at the very least I think it would be useful to have this in the tree as a benchmark of comparison against more efficient approaches. I suppose the question is what the best way of structuring this code would be given that we intend to replace parts of it with a better design later.
> So I wonder: if we want to really productize this, I'm not sure that the current code sharing/intrusion in cloneModule() is the right way to go. What do you think?
Whichever component we replace cloneModule() with will basically need to solve the same problem of providing some way to partition the module. It should be simple to revert the change to cloneModule() later once we have something more reasonable; it should mostly be a matter of constant folding the calls to ShouldCloneDefinition.
================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:32
@@ +31,3 @@
+/// This function returns once all ModuleCallback threads have finished
+/// execution. M is in an indeterminate state after this function is called.
+void SplitModule(Module *M, unsigned N,
----------------
joker.eph wrote:
> What about adding a warning/todo/fixme about "some subtle symbol visibility issues" that have to be fixed?
> (I notices you mention it'll be fixed //later//)
Done
================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:34
@@ +33,3 @@
+void SplitModule(Module *M, unsigned N,
+ std::function<void(unsigned I, StringRef BC)> ModuleCallback);
+
----------------
joker.eph wrote:
> A function that "Split a Module in N partition" is very valuable. I'm not necessarily convinced that the interface doesn't intermix two different things:
>
> - Splitting the module
> - What to do with the partitions (including issuing threads, etc.)
>
> What about providing the API that only cares about the actual split here?
> (I understand that it does not provide some pipeline where the first thread starts before the other partitions are created, but still)
>
> ```
> std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N);
>
> std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N, std::function<unsigned(GlobalValue &GV)> PartitionSelection);
> ```
>
> Where the callback allows the caller to control the partition logic.
> The first one could call the second one with your current modulo-hash partition method as a callback.
>
> Also if the Module is left in an undetermined state, the caller won't be able to do anything with it but destroying it, so we might better take ownership of it anyway. It might be possible to re-use it for partition 0 and save one clone?
I found that pipelining can give a significant performance boost -- on my machine it can take around 40 seconds to create each partition for Chromium -- so I'd like to find a way to keep it if possible. I came up with a new design for the API that avoids this function having to deal with threads, which I've implemented in the new patch.
Regarding the partition selection callback, I would suggest that we hold back on that until there is a use case for it.
> Also if the Module is left in an undetermined state, the caller won't be able to do anything with it but destroying it, so we might better take ownership of it anyway.
Done, but D12205 will be needed later for the code generator.
> It might be possible to re-use it for partition 0 and save one clone?
Yes, good point, that should be possible. I've added a FIXME.
================
Comment at: lib/Transforms/Utils/CloneModule.cpp:41
@@ -36,1 +40,3 @@
+llvm::CloneModule(const Module *M, ValueToValueMapTy &VMap,
+ std::function<bool(const GlobalValue *)> CloneDefinition) {
// First off, we need to create the new module.
----------------
joker.eph wrote:
> I would found the callback call sites more readable with a name like "ShouldCloneDefinition", because "CloneDefinition" makes me think it will actually Clone something.
> (just an opinion, feel free to do whatever seems best for you)
Done
================
Comment at: lib/Transforms/Utils/CloneModule.cpp:94
@@ +93,3 @@
+ // kinds of globals is forbidden), but this is generally not required for
+ // correctness.
+ continue;
----------------
joker.eph wrote:
> Function attributes are not required for correctness? I'm unsure about that.
Yes, hence "generally". Note that this is for external references only. Thread-local variables are the only case that springs to mind where attributes matter. But it turns out that aliases of thread-local variables are broken anyway:
$ cat tl1.ll
@x = thread_local global i32 0
@y = alias i32* @x
define void @f() {
store i32 42, i32* @x
store i32 42, i32* @y
ret void
}
$ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-pc-win32
.text
.def f;
.scl 2;
.type 32;
.endef
.globl f
.align 16, 0x90
f: # @f
.Ltmp0:
.seh_proc f
# BB#0:
.Ltmp1:
.seh_endprologue
movl _tls_index(%rip), %eax
movq %gs:88, %rcx
movq (%rcx,%rax,8), %rax
movl $42, x at SECREL32(%rax)
movl $42, y(%rip)
retq
.Ltmp2:
.seh_endproc
.section .tls$,"dw"
.globl x # @x
.align 4
x:
.long 0 # 0x0
.globl y
y = x
$ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-unknown-linux
.text
.file "tl1.ll"
.globl f
.align 16, 0x90
.type f, at function
f: # @f
.cfi_startproc
# BB#0:
movl $42, %fs:x at TPOFF
movl $42, y(%rip)
retq
.Lfunc_end0:
.size f, .Lfunc_end0-f
.cfi_endproc
.type x, at object # @x
.section .tbss,"awT", at nobits
.globl x
.align 4
x:
.long 0 # 0x0
.size x, 4
.globl y
y = x
.section ".note.GNU-stack","", at progbits
$ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-apple-darwin
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
.cfi_startproc
## BB#0:
pushq %rax
Ltmp0:
.cfi_def_cfa_offset 16
movq _x at TLVP(%rip), %rdi
callq *(%rdi)
movl $42, (%rax)
movl $42, _y(%rip)
popq %rax
retq
.cfi_endproc
.tbss _x$tlv$init, 4, 2 ## @x
.section __DATA,__thread_vars,thread_local_variables
.globl _x
_x:
.quad __tlv_bootstrap
.quad 0
.quad _x$tlv$init
.globl _y
_y = _x
.subsections_via_symbols
================
Comment at: lib/Transforms/Utils/SplitModule.cpp:37
@@ +36,3 @@
+ if (!GV->hasName())
+ GV->setName("__llvmlto_unnamed");
+}
----------------
joker.eph wrote:
> Minor comment: the file is //lib/Transforms/Utils/SplitModule.cpp// but the name includes "lto"
> Some context is leaking ;-)
Fixed
================
Comment at: tools/llvm-split/llvm-split.cpp:51
@@ +50,3 @@
+ std::string OutName = OutputFilename;
+ SplitModule(M.get(), NumOutputs, [=](unsigned I, StringRef BC) {
+ std::error_code EC;
----------------
joker.eph wrote:
> Not sure why the lambda needs to capture by copy?
This protected against the possibility of `std::string` (OutName) not being thread safe, but with the API change this is no longer needed.
http://reviews.llvm.org/D12132
More information about the llvm-commits
mailing list