[PATCH] D85408: Let -basic-block-sections=labels emit basicblock metadata in a new .bb_info section, instead of emitting special unary-encoded symbols.

Rahman Lavaee via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 26 12:37:57 PDT 2020


rahmanl created this revision.
Herald added subscribers: llvm-commits, hiraditya.
Herald added a project: LLVM.
rahmanl updated this revision to Diff 285826.
rahmanl added a comment.
rahmanl updated this revision to Diff 285828.
rahmanl updated this revision to Diff 285830.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.
rahmanl edited the summary of this revision.
rahmanl added reviewers: tmsriram, shenhan, efriedma, MaskRay.
rahmanl added a subscriber: amharc.
rahmanl updated this revision to Diff 285926.
rahmanl edited the summary of this revision.
rahmanl updated this revision to Diff 286044.
rahmanl marked 3 inline comments as done.
rahmanl updated this revision to Diff 286046.
rahmanl marked 6 inline comments as done.
rahmanl updated this revision to Diff 288058.
rahmanl marked an inline comment as done.
Herald added subscribers: wenlei, kerbowa, dexonsmith, steven_wu, nhaehnle, jvesely, arsenm.
Herald added a reviewer: bollu.
rahmanl updated this revision to Diff 288064.
rahmanl marked 7 inline comments as done.
rahmanl edited the summary of this revision.
rahmanl marked 3 inline comments as done.
rahmanl published this revision for review.

Modify clang test.


rahmanl added a comment.

Add clang test.


rahmanl added a comment.

Modify the clang test.


rahmanl added a comment.

Address Snehasish's comments.


rahmanl added a comment.

- Address Snehasish's comments. clang-format.


rahmanl added a comment.

- Address Snehasish's comments.


rahmanl added a comment.

Thanks for the great feedback @snehasishk.


tmsriram added a comment.

- The summary of this patch must be enhanced a little bit more to capture the highlights.
- Even though you point to the RFC, I think just add a little more.
- I would add that this patch basically un-pollutes the symtab, that is the greatest contribution to me which helps profilers, debuggers plug and play

Independently, what did we decide about the option?  Are we keeping the option?

**You should also include changes to the clang user manual to reflect this, documentation around -fbasic-block-section=labels can go in the same patch**


rahmanl added a comment.

Address Sri's comments.


rahmanl added a comment.

Address Sri's comments.



================
Comment at: clang/test/CodeGen/basic-block-sections.c:31-33
+// BB_LABELS: .Lfunc_begin0:
+// BB_LABELS: .LBB_END0_0:
+// BB_LABELS: .LBB0_1:
----------------
We need a test where we disasm the native object file and show how the contents look like.  This is good for readability of these sections. 


================
Comment at: clang/test/CodeGen/basic-block-sections.c:31-33
+// BB_LABELS: .Lfunc_begin0:
+// BB_LABELS: .LBB_END0_0:
+// BB_LABELS: .LBB0_1:
----------------
tmsriram wrote:
> We need a test where we disasm the native object file and show how the contents look like.  This is good for readability of these sections. 
The test you're envisioning won't look great.
If we want to dis-assemble the object file, we end up just having binary contents for every .bb_addr_map section.


================
Comment at: llvm/include/llvm/CodeGen/MachineBasicBlock.h:486
+  /// 2nd bit: set if this is a block ending with a tail call.
+  /// 3rd bit: seet if this is an exception handling (EH) pad.
+  /// The remaining bits are zero.
----------------
Typo.


================
Comment at: llvm/include/llvm/CodeGen/MachineBasicBlock.h:481
 
+  /// Returns the BB info metadata to be emitted in the bb_info section.
+  unsigned getBBInfoMetadata() const;
----------------
Document the metadata format? AFAICT the only to find out now is reading the code which generates the metadata as unsigned.


================
Comment at: llvm/include/llvm/CodeGen/MachineBasicBlock.h:481
 
+  /// Returns the BB info metadata to be emitted in the bb_info section.
+  unsigned getBBInfoMetadata() const;
----------------
snehasish wrote:
> Document the metadata format? AFAICT the only to find out now is reading the code which generates the metadata as unsigned.
+1 Let's document this heavily where possible:

* Under what options is this section emitted?
* What is captured here?
* Maybe a quick note on why this is used, just one sentence should be fine.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1049
+    emitLabelDifferenceAsULEB128(MBBSymbol, FunctionSymbol);
+    // Emit the basic block size.
+    emitLabelDifferenceAsULEB128(MBB.getEndSymbol(), MBBSymbol);
----------------
Can you add a comment here that basic block size cannot always be derived from label offsets due to alignment?  Isnt that the only reason, otherwise size is redundant right?


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1225
+              OutContext);
+          OutStreamer->emitELFSize(CurrentSectionBeginSym, SizeExp);
+        }
----------------
I am surprised we are not already doing this :).  This is needed for your patch right?  Otherwise, this needs to go as a separate patch.  


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1225
+              OutContext);
+          OutStreamer->emitELFSize(CurrentSectionBeginSym, SizeExp);
+        }
----------------
tmsriram wrote:
> I am surprised we are not already doing this :).  This is needed for your patch right?  Otherwise, this needs to go as a separate patch.  
What's happening here is that the lambda function emitELFSizeDirective is removed since we no longer have two use-cases (only sections).


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1027
+void AsmPrinter::emitBBInfoSection(const MachineFunction &MF) {
+  if (!MF.hasBBLabels())
+    return;
----------------
I have a slight preference to move this check to the callsite so that the reader does not have to follow it only to realize that their function of interest does not emit a bb info section. Though I see the convention in the code is to call it and then return early. Maybe documenting at the callsite is enough?


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1027
+void AsmPrinter::emitBBInfoSection(const MachineFunction &MF) {
+  if (!MF.hasBBLabels())
+    return;
----------------
snehasish wrote:
> I have a slight preference to move this check to the callsite so that the reader does not have to follow it only to realize that their function of interest does not emit a bb info section. Though I see the convention in the code is to call it and then return early. Maybe documenting at the callsite is enough?
Ya, I like moving the check to the call site. You can assert here instead and be done.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1317
 
+  emitBBInfoSection(*MF);
+
----------------
Add a comment here.


================
Comment at: llvm/lib/CodeGen/MachineBasicBlock.cpp:73
+    // label.
+    if (MF->hasBBSections() && isBeginSection()) {
       SmallString<5> Suffix;
----------------
I am happy this code is blown away! :)


================
Comment at: llvm/lib/CodeGen/MachineBasicBlock.cpp:61
+  return ((unsigned)isReturnBlock()) |
+         ((!empty() && TII->isTailCall(back())) << 1) | (isEHPad() << 2);
+}
----------------
If this block is empty should we even be calling this function?


================
Comment at: llvm/lib/CodeGen/MachineBasicBlock.cpp:61
+  return ((unsigned)isReturnBlock()) |
+         ((!empty() && TII->isTailCall(back())) << 1) | (isEHPad() << 2);
+}
----------------
snehasish wrote:
> If this block is empty should we even be calling this function?
Yes. An empty basic block is possible:

https://llvm.org/doxygen/AsmPrinter_8cpp_source.html#l03140


================
Comment at: llvm/lib/CodeGen/MachineFunction.cpp:343
 }
 
 /// This method iterates over the basic blocks and assigns their IsBeginSection
----------------
Ditto! Happy to see this gone!


================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:957
+
+MCSection *MCObjectFileInfo::getBBAddrMapSection(const MCSection &TextSec) const {
+  if (Env != IsELF)
----------------
Run clang-format on the patches.


================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:957
+
+MCSection *MCObjectFileInfo::getBBAddrMapSection(const MCSection &TextSec) const {
+  if (Env != IsELF)
----------------
tmsriram wrote:
> Run clang-format on the patches.
Apparently, this is the formatting chosen by clang-format.


================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:960
+MCSection *MCObjectFileInfo::getBBInfoSection(const MCSection &TextSec) const {
+  if (Env != IsELF)
+    return BBInfoSection;
----------------
Is this just returning the BBInfoSection pointer which is null? If so then prefer returning nullptr here to make it clear to the reader.


================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:971
+
+  return Ctx->getELFSection(".bb_info", ELF::SHT_PROGBITS, Flags, 0, GroupName,
+                            MCSection::NonUniqueID,
----------------
Can we call the section something other than bb_info? For example when examining the contents of a binary a name like "stack_sizes" provides some obvious meaning without further investigation. Similarly, can we call this something like "bb_addr_offset_map"?


================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:971
+
+  return Ctx->getELFSection(".bb_info", ELF::SHT_PROGBITS, Flags, 0, GroupName,
+                            MCSection::NonUniqueID,
----------------
snehasish wrote:
> Can we call the section something other than bb_info? For example when examining the contents of a binary a name like "stack_sizes" provides some obvious meaning without further investigation. Similarly, can we call this something like "bb_addr_offset_map"?
Great suggestion. I renamed the section to bb_addr_map and also changed the function names accordingly.


================
Comment at: llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll:1
+; RUN: llc < %s -mtriple=x86_64-linux -function-sections -basic-block-sections=labels | FileCheck %s
+
----------------
Can we make the triples that we use consistent across the various tests?


================
Comment at: llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll:5
+; CHECK: .section        .text._Z3barv,"ax", at progbits
+; CHECK-LABEL: _Z3barv:
+; CHECK-NEXT: [[BAR_BEGIN:.+]]:
----------------
Lets align the contents of the check to make it easier to inspect.


================
Comment at: llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll:15
+
+;; Check we add .bb_info section to a COMDAT group with the corresponding .text section if such a COMDAT exists.
+; CHECK: .section        .text._Z4fooTIiET_v,"axG", at progbits,_Z4fooTIiET_v,comdat
----------------
Consider moving the CHECK lines into the functions themselves to make it clear which CHECKS correspond to which functions.


This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section.
The format of the emitted data is represented as follows. It includes a header for every function:

| Address of the function                      | -> 8 bytes (pointer size) |
| Number of basic blocks in this function (>0) | -> ULEB128                |
|

The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows:

| Offset of the basic block relative to function begin | -> ULEB128                                                                       |
| Binary size of the basic block                       | -> ULEB128                                                                       |
| BB metadata                                          | -> ULEB128  [ MBB.isReturn() OR MBB.hasTailCall() << 1  OR  MBB.isEHPad() << 2 ] |
|

The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels. 
The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers.
Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%.

For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D85408

Files:
  clang/docs/UsersManual.rst
  clang/test/CodeGen/basic-block-sections.c
  llvm/include/llvm/CodeGen/AsmPrinter.h
  llvm/include/llvm/CodeGen/MachineBasicBlock.h
  llvm/include/llvm/CodeGen/MachineFunction.h
  llvm/include/llvm/MC/MCObjectFileInfo.h
  llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
  llvm/lib/CodeGen/BasicBlockSections.cpp
  llvm/lib/CodeGen/MIRParser/MIRParser.cpp
  llvm/lib/CodeGen/MachineBasicBlock.cpp
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/MC/MCObjectFileInfo.cpp
  llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll
  llvm/test/CodeGen/X86/basic-block-sections-labels.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D85408.288064.patch
Type: text/x-patch
Size: 23755 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20200826/0eb47974/attachment-0001.bin>


More information about the cfe-commits mailing list