[PATCH] D144596: Add extra parameter to thinlto-prefix-replace

Teresa Johnson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Mar 11 11:17:46 PST 2023


tejohnson added a comment.

In D144596#4182584 <https://reviews.llvm.org/D144596#4182584>, @MaskRay wrote:

> (I'll be be out for most of the next 4 weeks and may not have much time to review patches.)

Ok, will work with @itf on this particular patch, and as mentioned below, let's discuss a follow on proposal to change the minimized bitcode file handling separately.

> Let me write down my understanding and the proposal.
>
> When compiling a source file with `-flto=thin`, we additionally specify `-fthin-link-bitcode=` to create a minimized bitcode file.
>
> Here is an example (simplified from how Bazel implements distributed ThinLTO):
>
>   for i in a b c; do clang -c -O2 -flto=thin -fthin-link-bitcode=$i.indexing.o $i.c; done
>   clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/',--thinlto-object-suffix-replace='.indexing.o;.o' elf0.o a.indexing.o -Wl,--start-lib [bc].indexing.o -Wl,--end-lib elf1.o
>   clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
>   clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
>   clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
>   clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o
>
> In the thin link, we specify `--thinlto-object-suffix-replace=.indexing.o;.o` to change IR module names.
>
> In the emitted index files (`*.thinlink.bc`) and import files (`*.imports`), we will see `[abc].o` instead of `[abc].indexing.o`.
>
>   % llvm-bcanalyzer --dump lto/a.o.thinlto.bc | grep '<ENTRY'
>       <ENTRY abbrevid=6 op0=0 op1=97 op2=46 op3=111/> record string = 'a.o'
>       <ENTRY abbrevid=6 op0=1 op1=98 op2=46 op3=111/> record string = 'b.o'
>
> Note, thin links only require minimized bitcode files, while backend compiles only require full bitcode files.

The above is all correct afaict.

> To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories.
> If these bitcode files are within the same directory, Bazel will have to ship both full and minimized bitcode files to a backend compile action, causing waste.
> To ship just full bitcode files in a backend compile action, full and minimized bitcode files need to be in different directories (e.g. `a.bc` and `indexing/a.bc`).

I'm very confused as this is not mentioned anywhere in this patch by @itf. Normally, at least without tree artifacts, we only ship the specified input files, not the entire directory. The stated motivation for this particular patch, which was unrelated to the input files of the indexing action, was:

---

> Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.
>
> This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".

---

@itf can you clarify whether the input object handling is different than for non-tree artifacts as well?

Regardless, I would prefer to keep this patch about the original issue of the write-once output folders. Additional refinements to the input files can be discussed in follow on patches.

> While we're here, let's make two more changes. First, let's use `.bc` instead of `.o` for the bitcode file extension name.

I'm not in favor of hardcoding the extension in the compiler, it should be left to the build system, but I don't believe that what you are proposing would require a specific extension anyway. Currently, the object output file extension of the compile action is not a hardcoded requirement and I don't think it should be. Additionally, when we were initially implementing ThinLTO I recall that the Bazel team specifically wanted to keep the .o extension (otherwise the handling gets a bit trickier for them as a compile action will have different extensions depending on the optimization options). But like I said it doesn't appear to me that what you are proposing would require this.

> Second, let's place the initially compiled bitcode files in `thin/` instead of the current working directory for clarity.

Again, this should be decided in the build system, not by the compiler. But it also isn't clear to me that your proposal requires a specific name, just that your proposal is that we require the full and minimized bitcode to be in separate directories and supply the mapping between those instead of the mapping between the extensions. As mentioned above, I would like to understand from @itf if that is needed for tree artifacts, and discuss this in a separate patch, as it seems more like a refinement and less like a requirement, and is orthogonal to this particular change. It may also require additional changes to the build system.

In D144596#4182628 <https://reviews.llvm.org/D144596#4182628>, @MaskRay wrote:

> In D144596#4178243 <https://reviews.llvm.org/D144596#4178243>, @tejohnson wrote:
>
>> In D144596#4176555 <https://reviews.llvm.org/D144596#4176555>, @MaskRay wrote:
>>
>>> I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).
>>
>> I'll take a stab at suggesting a clearer summary in a little bit.
>
> Thanks!

Got busy and forgot to do so. I suggest something like the following:

---

Title: [lld] Support separate native object file path in --thinlto-prefix-replace

Summary:
Currently, the `--thinlto-prefix-replace="oldpath;newpath"` option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

  1) the output files of the thin link itself (the .thinlto.bc index files and the
  optional .imports files)
  2) the specified object file paths written to the response file given in the
  --thinlto-index-only=${response} option, which is used by the final native
  link and must match the paths of the native object files that will be
  produced by ThinLTO backend compiles.

This patch expands the `--thinlto-prefix-replace` option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: `--thinlto-prefix-replace="origpath;outpath[;objpath]"`

This replaces the `origpath` directory tree of the thin link input files with
`outpath` when writing the thin link index and imports outputs (number 1
above). If `objpath` is specified it replaces `origpath` of the input files with
`objpath` when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using `outpath` for this as well.

---

>>> Feel free to take some of the command examples below to make things clearer
>>>
>>> Say, our binary target is named `$package:b` and has a source file `$package/b.cc` along with other dependencies.
>>>
>>> The new semantics can be used this way by newer Bazel: `-Wl,--thinlto-index-only=bazel-bin/$package/b-lto-final.params,--thinlto-emit-imports-files,--thinlto-prefix-replace=bazel-bin/;bazel-bin/$package/b.lto/;bazel-bin/$package/b.lto.backend/,--lto-obj-path=bazel-bin/$package/b.lto.merged.o`
>>>
>>> - `bazel-bin/$package/_objs/b/b.pic.o`: bitcode file
>>> - `bazel-bin/$package/_objs.indexing/b/b.pic.o`: minimized bitcode file
>>
>> To clarify, the above file is currently emitted in the same directory as the full bitcode, and just has a different extension. I suppose you are proposing that we change to the above naming scheme, but that is not so simple, see below.
>
> The "To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories." part in my previous long reply captures my understanding.
>
> Yes, `b.pic.o` and `b.pic.indexing.o` are in the same directory, and we use `--thinlto-object-suffix-replace='.indexing.o;.o'` to derive the full bitcode file from the minimized bitcode file.
> With off-topic discussion with @itf , I think I kinda understand why `b.pic.o` and `b.pic.indexing.o` need to be in different directories now (with tree artifacts we have to ship the whole directory, not individual files), but more context from the Bazel side will be nice to have.

As mentioned earlier, I would like clarification from @itf. But regardless, this seems like a separate refinement and I would prefer to separate the 2 issues and discuss and design any changes to this handling in a different patch.

>>> - `bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.thinlto.bc`: index file
>>> - `bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.imports`: import file
>>> - `bazel-bin/$package/b.lto.backend/$package/_objs/b/b.pic.o`: ELF relocatable object file after a backend compile
>>> - `bazel-bin/$package/b-lto-final.params`: response file for the final native link (it references ELF relocatable object files after backend compiles)
>>>
>>> There are two advantages:
>>>
>>> - (This doesn't require lld changes.) the full bitcode file and the minimized bitcode file are in different directories. This allows a ThinLTO indexing action to ship fewer input files.
>>
>> Moving these into separate directories won't affect what the action ships (at least in Bazel). The inputs are specified as full paths, not directories. In fact, the main reason on our end for implementing and using minimized bitcode files was to reduce the aggregate amount of input files sent to the indexing action.
>>
>>> This may be a Bazel limitation, but a fairly reasonable limitation that other build systems may have as well: they track directories, but not individual files.
>>>
>>> - The ELF relocatable object file after a backend compile is now in a different directory. This ensures the directory `b.lto` is only used by one action. This requires this lld patch.
>>>
>>>   AIUI `--thinlto-object-suffix-replace=.indexing.o;.o` can effectively be deprecated. So after a transition period when we finally remove `--thinlto-object-suffix-replace`, the feature as proposed in this patch will not increase the overall complexity.
>>
>> Moving the minimized bitcode to a different directory will not obviate the need for an option. The thin link indexing step needs to know where the corresponding full bitcode is so that it can set up the paths for the later ThinLTO backend actions, which need to know the full IR paths where they are importing from. These full IR bitcode paths are recorded in the .thinlto.bc index files as well as the .imports files. If we moved the minimized bitcode to a different directory but used the same suffix, we would need to replace this with a different option that specified the prefix mapping from the path of the minimized IR (input to the indexing action) to the path of the full IR that will be used during importing by the later ThinLTO backend actions.
>
> Agreed. `--thinlto-prefix-replace='thin/;index/;obj/;indexing/'` in my previous long message should work.
>
>>> Therefore, I am happy with the direction.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144596/new/

https://reviews.llvm.org/D144596



More information about the llvm-commits mailing list