[PATCH] D12132: TransformUtils: Introduce module splitter.

Thu Aug 20 12:14:40 PDT 2015

pcc added a comment.

In http://reviews.llvm.org/D12132#227433, @joker.eph wrote:

> Hi Peter,
>
> I wonder if we can't make the split really more efficient, especially memory wise. Ideally we should seek to:
>
> - Limit looping over the module content
> - Move content instead of duplicating it.
>
>   This is especially important since LTO deals with very large module. So reusing the cloneModule() implementation is clever and nicely done, but on the long term I'm not sure it is the way to go. I can imagine many speed/memory improvements that could be done by splitting a Module in N partitions in one pass.


I agree with all of this, and I'd be interested in exploring any ideas around making this more efficient, but at the very least I think it would be useful to have this in the tree as a benchmark of comparison against more efficient approaches. I suppose the question is what the best way of structuring this code would be given that we intend to replace parts of it with a better design later.

> So I wonder: if we want to really productize this, I'm not sure that the current code sharing/intrusion in cloneModule() is the right way to go. What do you think?


Whichever component we replace cloneModule() with will basically need to solve the same problem of providing some way to partition the module. It should be simple to revert the change to cloneModule() later once we have something more reasonable; it should mostly be a matter of constant folding the calls to ShouldCloneDefinition.


================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:32
@@ +31,3 @@
+/// This function returns once all ModuleCallback threads have finished
+/// execution. M is in an indeterminate state after this function is called.
+void SplitModule(Module *M, unsigned N,
----------------
joker.eph wrote:
> What about adding a warning/todo/fixme about "some subtle symbol visibility issues" that have to be fixed?
> (I notices you mention it'll be fixed //later//)
Done

================
Comment at: include/llvm/Transforms/Utils/SplitModule.h:34
@@ +33,3 @@
+void SplitModule(Module *M, unsigned N,
+                 std::function<void(unsigned I, StringRef BC)> ModuleCallback);
+
----------------
joker.eph wrote:
> A function that "Split a Module in N partition" is very valuable. I'm not necessarily convinced that the interface doesn't intermix two different things:
> 
> - Splitting the module
> - What to do with the partitions (including issuing threads, etc.)
> 
> What about providing the API that only cares about the actual split here?
> (I understand that it does not provide some pipeline where the first thread starts before the other partitions are created, but still)
> 
> ```
> std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N);
> 
> std::vector<std::unique_ptr<Module>> SplitModule(std::unique_ptr<Module> M, unsigned N, std::function<unsigned(GlobalValue &GV)> PartitionSelection);
> ```
> 
> Where the callback allows the caller to control the partition logic.
> The first one could call the second one with your current modulo-hash partition method as a callback.
> 
> Also if the Module is left in an undetermined state, the caller won't be able to do anything with it but destroying it, so we might better take ownership of it anyway. It might be possible to re-use it for partition 0 and save one clone?
I found that pipelining can give a significant performance boost -- on my machine it can take around 40 seconds to create each partition for Chromium -- so I'd like to find a way to keep it if possible. I came up with a new design for the API that avoids this function having to deal with threads, which I've implemented in the new patch.

Regarding the partition selection callback, I would suggest that we hold back on that until there is a use case for it.

> Also if the Module is left in an undetermined state, the caller won't be able to do anything with it but destroying it, so we might better take ownership of it anyway.

Done, but D12205 will be needed later for the code generator.

> It might be possible to re-use it for partition 0 and save one clone?

Yes, good point, that should be possible. I've added a FIXME.

================
Comment at: lib/Transforms/Utils/CloneModule.cpp:41
@@ -36,1 +40,3 @@
+llvm::CloneModule(const Module *M, ValueToValueMapTy &VMap,
+                  std::function<bool(const GlobalValue *)> CloneDefinition) {
   // First off, we need to create the new module.
----------------
joker.eph wrote:
> I would found the callback call sites more readable with a name like "ShouldCloneDefinition", because "CloneDefinition" makes me think it will actually Clone something.
> (just an opinion, feel free to do whatever seems best for you)
Done

================
Comment at: lib/Transforms/Utils/CloneModule.cpp:94
@@ +93,3 @@
+      // kinds of globals is forbidden), but this is generally not required for
+      // correctness.
+      continue;
----------------
joker.eph wrote:
> Function attributes are not required for correctness? I'm unsure about that.
Yes, hence "generally". Note that this is for external references only. Thread-local variables are the only case that springs to mind where attributes matter. But it turns out that aliases of thread-local variables are broken anyway:

  $ cat tl1.ll
  @x = thread_local global i32 0
  @y = alias i32* @x
  
  define void @f() {
    store i32 42, i32* @x
    store i32 42, i32* @y
    ret void
  }
  $ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-pc-win32
  	.text
  	.def	 f;
  	.scl	2;
  	.type	32;
  	.endef
  	.globl	f
  	.align	16, 0x90
  f:                                      # @f
  .Ltmp0:
  .seh_proc f
  # BB#0:
  .Ltmp1:
  	.seh_endprologue
  	movl	_tls_index(%rip), %eax
  	movq	%gs:88, %rcx
  	movq	(%rcx,%rax,8), %rax
  	movl	$42, x at SECREL32(%rax)
  	movl	$42, y(%rip)
  	retq
  .Ltmp2:
  	.seh_endproc
  
  	.section	.tls$,"dw"
  	.globl	x                       # @x
  	.align	4
  x:
  	.long	0                       # 0x0
  
  
  	.globl	y
  y = x
  $ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-unknown-linux
  	.text
  	.file	"tl1.ll"
  	.globl	f
  	.align	16, 0x90
  	.type	f, at function
  f:                                      # @f
  	.cfi_startproc
  # BB#0:
  	movl	$42, %fs:x at TPOFF
  	movl	$42, y(%rip)
  	retq
  .Lfunc_end0:
  	.size	f, .Lfunc_end0-f
  	.cfi_endproc
  
  	.type	x, at object               # @x
  	.section	.tbss,"awT", at nobits
  	.globl	x
  	.align	4
  x:
  	.long	0                       # 0x0
  	.size	x, 4
  
  
  	.globl	y
  y = x
  	.section	".note.GNU-stack","", at progbits
  $ ~/src/llvm-build-rel/bin/llc -o - tl1.ll -mtriple=x86_64-apple-darwin
  	.section	__TEXT,__text,regular,pure_instructions
  	.globl	_f
  	.align	4, 0x90
  _f:                                     ## @f
  	.cfi_startproc
  ## BB#0:
  	pushq	%rax
  Ltmp0:
  	.cfi_def_cfa_offset 16
  	movq	_x at TLVP(%rip), %rdi
  	callq	*(%rdi)
  	movl	$42, (%rax)
  	movl	$42, _y(%rip)
  	popq	%rax
  	retq
  	.cfi_endproc
  
  .tbss _x$tlv$init, 4, 2                 ## @x
  
  	.section	__DATA,__thread_vars,thread_local_variables
  	.globl	_x
  _x:
  	.quad	__tlv_bootstrap
  	.quad	0
  	.quad	_x$tlv$init
  
  
  	.globl	_y
  _y = _x
  .subsections_via_symbols


================
Comment at: lib/Transforms/Utils/SplitModule.cpp:37
@@ +36,3 @@
+  if (!GV->hasName())
+    GV->setName("__llvmlto_unnamed");
+}
----------------
joker.eph wrote:
> Minor comment: the file is //lib/Transforms/Utils/SplitModule.cpp// but the name includes "lto" 
> Some context is leaking ;-)
Fixed

================
Comment at: tools/llvm-split/llvm-split.cpp:51
@@ +50,3 @@
+  std::string OutName = OutputFilename;
+  SplitModule(M.get(), NumOutputs, [=](unsigned I, StringRef BC) {
+    std::error_code EC;
----------------
joker.eph wrote:
> Not sure why the lambda needs to capture by copy?
This protected against the possibility of `std::string` (OutName) not being thread safe, but with the API change this is no longer needed.


http://reviews.llvm.org/D12132