[PATCH] D68065: Propeller: LLD Support for Basic Block Sections

Tue Mar 24 01:35:44 PDT 2020

tmsriram added subscribers: snehasish, tejohnson.
tmsriram added a comment.

In D68065#1938447 <https://reviews.llvm.org/D68065#1938447>, @MaskRay wrote:

> In D68065#1937331 <https://reviews.llvm.org/D68065#1937331>, @tmsriram wrote:
>
> > In D68065#1934866 <https://reviews.llvm.org/D68065#1934866>, @MaskRay wrote:
> >
> > > I am very glad to see that we have made progress by landing D68063 <https://reviews.llvm.org/D68063> (llvm/CodeGen/CommandFlags.inc) and D73674 <https://reviews.llvm.org/D73674> (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.
> > >
> > > There are several optimizations goals:
> > >
> > > - Alignment inserting
> > > - Automatic cache prefetching
> > > - Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
> > > - ...
> >
>

@echristo @snehasish @dxf @ruiu @tejohnson  @rnk

All the above ideas you mentioned here, you heard it from my team in face to face meetings and the last one in internal discussions. I recall specifically telling you some of it  is in very early stages and to keep it to yourself. You could have at least checked with us before mentioning it, as just courtesy if not anything else.

>>> There is a CPU erratum that we want to mitigate.
>>> 
>>> - Intel's Jump Condition Code Erratum
>>> 
>>>   By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.
>>> 
>>>   Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.
>>> 
>>>   On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

>>> Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).
>> 
>> @echristo @ruiu
>> 
>> If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html
>> 
>> Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general.  Could we please unblock this?
> 
> Intel JCC erratum is not the only concern. My bigger concern is whether we can achieve our post-link optimization goals other than layout shuffling with the current scheme:
> 
> - Alignment inserting
> - Automatic cache prefetching
> - Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing. ...
> - ...

> These points were already listed in my previous comments. I believe internally you probably have more brainstorming thoughts. As I said on https://lists.llvm.org/pipermail/llvm-dev/2020-March/139639.html , I am not yet convinced that with the no disassembly assumption, reordering opaque sections can achieve the above goals. Post-link optimization is not a new idea and there have been several engineering efforts before Propeller. However, Propeller is the first integrating the great idea into LLVM. As I said I look forward to its bright future. I just hope that we can create a generic framework. Our focus is currently section reordering. When we start to think future optimization opportunities, we don't need to create one, two, three, four more frameworks.

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :)  If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM.  Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do.  We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Multiple people with significant LTO experience  have told you ideas resembling Full LTO have scalability issues and ThinLTO has had a lot of adoption due to exactly this.  If you want to prove them wrong, good luck to you but please don't ask us to do the heavy lifting!

As for disassembly, we have not presented a single patch that does any serious disassembly and we fully understand the pitfalls.  We understand that the jump relaxation does mild disassembly and we are looking at relocations to overcome that as you already know.

We are looking at efficient ways to accomplish our other optimization objectives and we will present clear designs with experiments on llvm-dev when we get it done.   The idea here is to do thin links like @mehdi_amini  alluded to in that thread of yours: https://llvm.org/devmtg/2020-02-23/#kl  which will do most of the transformations in the compiler and use thin links to generate summaries that are whole program.

> I saw Rahman posted https://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html yesterday. Sorry that I did not have time reading it today. If the idea is that more layout work will be done by the compiler, then it starts to look good to me.

I urge you to read that as we spent significant time to conclusively prove that JCC erratum is a non-issue.  I can summarize the plan to you:

- We have been looking at constantly reducing the bloat from extra sections to be as low as possible. Some of the work we did here was to selectively create basic block sections.
- During this, we realized we can even do better if we can compute basic block orders early immediately after profiling.  This would require using a dynamic CFG but we have just protoyped it and it has the same performance benefits.
- This  could also moves the bulk of the Propeller work from the linker to a third party tool, create_llvm_prof. This patch is still necessary as it relates to bb sections.
- This allows us to  form larger sections in the compiler and not wait for the linker to do the reordering.  We would still have to create multiple sections but a lot fewer, significantly reducing the bloats.
- This means we can also reuse the existing assembler mitigations  developed without having to reinvent them in the linker which gives us an immediate solution.
- Performance is neutral (0.2% slowdown) after applying the mitigations. Infact, without Propeller the performance is down by 0.6% from the mitigations.
- To be clear, we get all of the Propeller wins even with the mitigations, measured on clang benchmark.
- We feel the linker can do a much better job here since the mitigations are only using NOPs and contrary to what you told us in the meeting, prefixes dont seem to help.  You have noted this yourself so I am still wondering why you told us that prefixes help:  https://reviews.llvm.org/D72225#1818149  where you say " NOP padding alone seems good"
- The linker does not have to use the large hammer of aligning every function to 32 byte boundaries but this is something best discussed in a design thread

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 <https://reviews.llvm.org/D68063> ..." and yet you are blocking this.  This is ridiculous.  If you have fundamental disagreements, you should also have been blocking the other patches.  The other patches are not very useful without this, whats up with the selective blocking!

To conclude, it is perfectly fine if you are opposed to this and don't wish to unblock.  I am trying to act in good faith and  I am just going to have to push this around you if I get the approvals or kill basic block sections.  We have to agree to disagree.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68065/new/

https://reviews.llvm.org/D68065