[PATCH] D153376: Introducing llvm-cm: A Cost Model Tool

Thu Jul 27 01:39:25 PDT 2023

jhenderson added a comment.

My apologies for being slow in getting back to you - I had some time off and then have been busy catching up on all sorts of other reviews. By the way, feel free to ping the thread if it goes stale for a week.

I've not reviewed the main body of the code today - I ran out of time, but there should be plenty to get on with. If I missed any questions, please reask them.

================
Comment at: llvm/test/tools/llvm-cm/X86/bb-addr-map.test:1
+## This test checks that llvm-cm outputs an error when 
+## failing to read a valid basic block address mapping.
----------------
jhenderson wrote:
> Avoid trailing whitespace, and also this was wrapped rather prematurely.
Nothing in this test is X86 specific, so move the test out of the X86 folder, so that it can be run on all targets.

================
Comment at: llvm/test/tools/llvm-cm/X86/bb-addr-map.test:1-2
+## This test checks that llvm-cm outputs an error when 
+## failing to read a valid basic block address mapping.
+# RUN: yaml2obj %s -o %t.o
----------------
Avoid trailing whitespace, and also this was wrapped rather prematurely.

================
Comment at: llvm/test/tools/llvm-cm/X86/bb-addr-map.test:6
+
+# CHECK: error: failed to read basic block address mapping
+
----------------
I expect there's some additional context that this error could have. Why did the reading fail? At the moment, it's basically impossible for a user to be able to know how to resolve it.

================
Comment at: llvm/test/tools/llvm-cm/X86/empty.s:1
+## Check that llvm-cm does not produce any output on an empty input file.
+# RUN: llvm-mc -o %t.o --filetype=obj -triple=x86_64-unknown-linux-gnu %s
----------------
I guess this is more grammatically correct, sorry :)

================
Comment at: llvm/test/tools/llvm-cm/X86/inst_count.s:1
+## LLVM-CM instruction counting functionality test.
+# RUN: llvm-mc -o %t.o --filetype=obj -triple=x86_64-unknown-linux-gnu %s
----------------
This is more of a title than a descriptive comment. Also the tool is `llvm-cm` not `LLVM-CM` (normally!).

I'd suggest the following: "This test shows that llvm-cm can count instructions correctly." or something to that effect.

================
Comment at: llvm/test/tools/llvm-cm/X86/inst_count.s:5-6
+
+# CHECK: <BB0>
+# CHECK: total # of instructions: 3
+# CHECK-NEXT: multiply:
----------------
Nit: typically, I encourage adding some spaces between the CHECK: and the text that is being checked for, to make it line up with the CHECK-NEXT lines.

On the other hand, why is the second of these not a CHECK-NEXT line?

================
Comment at: llvm/test/tools/llvm-cm/X86/inst_count.s:19
+
+	.text
+	.file	"inst_count.ll"
----------------
There's a lot of junk in the asm that somewhat obscures what is actually interesting about the input, and therefore what you really are trying to test. At a guess, without looking at the code logic, what you're really interested in are 1) the symbols, 2) the instructions within a symbol, and 3) the BB structures. If that is indeed the case, I wonder whether using YAML would allow you to exercise greater control, without needing to spell out every part of the BB structure (it might not)? Could you use sequences of `nop` instructions for your purposes, or is there a need for them to be all different?

================
Comment at: llvm/test/tools/llvm-cm/X86/multi-func.s:1
+## Check that llvm-cm can handle input containing many basic blocks across functions.
+# RUN: llvm-mc -o %t.o --filetype=obj -triple=x86_64-unknown-linux-gnu %s
----------------
In what way is this test significantly different to inst_count.s? If both are actually needed, could the same simplifications to the asm be made?

> Also, prefer - in test names over _ due to it being easier to type.

This comment applied to inst_count.s and bad_triple.s too (plus any other tests you write).

================
Comment at: llvm/test/tools/llvm-cm/X86/multi-func.s:5
+
+# CHECK: main:
+# CHECK-NEXT: <BB0>: 0000000000000000
----------------

================
Comment at: llvm/test/tools/llvm-cm/X86/multi-func.s:29-30
+# CHECK-NEXT: <BB7>: 000000000000007a
+# CHECK-NEXT: Number of instructions in BB: 7
+# CHECK-NEXT: total # of instructions: 36
+# CHECK-NEXT: isPrime:
----------------
This applies more generally, I just placed my comment here somewhat arbitrarily :)

There's a weird inconsistency here between the capitalized "Number" and all-lower-case "total". Furthermore, it's not expecially easy to spot where the end of one function is and the start of the next. Can I suggest a) being consistent with your capitalization, and b) adding blank lines in the output between the total line and the next function? You might also want to include the function name in the total line too, since it could be a long way from the start of it, but I'm not too fussed by that necessarily.

================
Comment at: llvm/test/tools/llvm-cm/X86/sections-no-symbol-name.test:1-3
+## This test checks that llvm-cm outouts an error message
+## when attempting to disassemble a symbol with no name, even
+## if there is a valid section name.
----------------
Unnecessary wrapping - you could save a line by not wrapping so early.

Also, typo in "outouts".

That being said, I don't think this comment and test name exactly line up with what you test. It is normal for section symbols to not have names. Tools like llvm-objdump synthesise a name from the section name for a section symbol, so the error here is when a section symbol doesn't have a valid section index. You don't even need a section for the test to produce the same behaviour, if I'm not mistaken.

================
Comment at: llvm/test/tools/llvm-cm/X86/sections-no-symbol-name.test:7
+
+# CHECK: error: failed to get symbol name
+
----------------
More context in this message please. Which symbol couldn't you get the name for (report the index).

================
Comment at: llvm/test/tools/llvm-cm/X86/sections-no-symbol-name.test:22
+    Type:         STT_SECTION
+    
----------------
Nit: trailing whitespace.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:1-2
+//===- llvm-cm.cpp - LLVM cost modeling tool
+//----------------------------------===//
+//
----------------
Please fix your comment header.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:68
+static cl::opt<std::string> TripleName("triple",
+                                       cl::desc("Target triple name. "),
+                                       cl::init(LLVM_DEFAULT_TARGET_TRIPLE),
----------------
Why the trailing whitespace in the description?

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:74
+
+static void exitIf(bool Cond, Twine Message) {
+  if (Cond) {
----------------
Let's just spell it `Condition`. There's no need for the brevity.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:80
+}
+struct FilterResult {
+  // True if the section should not be skipped.
----------------
Nit: new line required between functions/structs etc.

Also this struct should be in an anonymous namespace.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:92
+SectionFilter
+gettoolSectionFilter(object::ObjectFile const &O, uint64_t *Idx,
+                     const std::vector<std::string> &FilterSections) {
----------------
Not sure what relevance "tool" is in this name, so get rid of it.

Also, prefer west const, in keeping with the wider LLVM style.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:93
+gettoolSectionFilter(object::ObjectFile const &O, uint64_t *Idx,
+                     const std::vector<std::string> &FilterSections) {
+  StringSet<> FoundSectionSet;
----------------
`ArrayRef`?

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:97
+    *Idx = std::numeric_limits<uint64_t>::max();
+  return llvm::object::SectionFilter(
+      /*Pred=*/
----------------
You've got `using namespace llvm::object` at the top of this file, so there's no need for the qualifiers here.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:98
+  return llvm::object::SectionFilter(
+      /*Pred=*/
+      [Idx, FoundSectionSet, FilterSections](object::SectionRef S) {
----------------
In general, these sort of label comments are only added for literals, so I'd get rid of them from here.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:101
+        FilterResult Result = {true, true};
+        if (Idx != nullptr && Result.IncrementIndex)
+          *Idx += 1;
----------------
`Result.IncerementIndex` is always going to be `true` here...

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:108
+
+[[noreturn]] static void error(Error Err) {
+  logAllUnhandledErrors(std::move(Err), WithColor::error(outs()),
----------------
Keep all your error reporting functions together in one place.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:110
+  logAllUnhandledErrors(std::move(Err), WithColor::error(outs()),
+                        "reading file: ");
+  outs().flush();
----------------
This "reading file: " context isn't particularly useful, as it prevents this function from being used for non-file errors, e.g. command-line processing errors. Take a look at `createFileError` as a way of adding the file name to `Error`.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:112
+  outs().flush();
+  exit(1);
+}
----------------
As requested before, please use `std::exit`

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:150
+
+void printFunctionNames(ArrayRef<SymbolInfoTy> &Aliases) {
+  for (size_t I = 0; I < Aliases.size(); ++I) {
----------------
`static`?

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:184
+  // Count the number of instructions in each basic block.
+  bool EnteredBb = false;
+  while (Index < End) {
----------------

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:117
+    WithColor::warning(errs(), "llvm-cm")
+        << "Failed to get section name: " << toString(SecNameOrErr.takeError())
+        << "\n";
----------------
JestrTulip wrote:
> jhenderson wrote:
> > JestrTulip wrote:
> > > jhenderson wrote:
> > > > https://llvm.org/docs/CodingStandards.html#error-and-warning-messages
> > > > 
> > > > Also, test case?
> > > Regarding the test cases for aspects such as Section Name, a good portion of the error handling for disassembly was modeled off the error handling in tools such as objdump and mca. For this specific case, and many of those that are unaddressed within the program, I find it hard to come up with a way to properly test these occurrences. 
> > > For example, this error requires a Section to exist within the file, but its name to not be found. 
> > > If I could receive any ideas on how to properly test situations like this, i'd be very grateful. 
> > > 
> > > 
> > Take a look at the test cases for llvm-readobj - there are many examples where it uses yaml2obj to customise the input in some way, e.g. creating an input without a section header string table (see https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/sections-no-section-header-string-table.test). You should be able to use yaml2obj to achieve a broken input in some manner using these examples (more can be found in the ELF yaml2obj tests too, if you want to explore the various options you have). NB: you don't need to test every possible failure mode that the underlying library can hit (these should already have testing elsewhere). Instead, you should make sure you have testing that covers the case where a particularly library function returns an error, to show that you handled the error correctly.
> I do have a question regarding some of the tests. I've used yaml2obj to for some of them, but I was wondering how to create the necessary conditions for others, namely the error messages that occur despite a proper target (e.g. AsmInfo, FeatureVals, SubInfo, etc..). I was also wondering if there was a way to generate an object file with invalid sections, for this message. 
> 
> ```
> if (!SecNameOrErr) {
>     WithColor::warning(errs(), "llvm-cm")
>         << "Failed to get section name: " << toString(SecNameOrErr.takeError())
>         << "\n";
> }
> ```
Re. AsmInfo etc, I don't know if there's a good way of testing those, so it may not be possible. Try searching through the other tools to see if any of them test them and if so, how they do so.

When you say "invalid sections" what do you mean? A section can be invalid in many different ways, and different mechanisms will be needed to test the different cases.

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:31
+#include "llvm/MC/MCTargetOptionsCommandFlags.h"
+#include "llvm/MC/SubtargetFeature.h"
+#include "llvm/MC/TargetRegistry.h"
----------------
MaskRay wrote:
> and clang-format this file.
This inline edit doesn't seem to have been addressed?

================
Comment at: llvm/tools/llvm-cm/llvm-cm.cpp:164
+
+// TODO: Share this with llvm-objdump.cpp. 
+static uint8_t getElfSymbolType(const llvm::object::ObjectFile &Obj, const llvm::object::SymbolRef &Sym) {
----------------
JestrTulip wrote:
> jhenderson wrote:
> > JestrTulip wrote:
> > > jhenderson wrote:
> > > > Any reason you can't do this upfront, like you did with the section filter stuff?
> > > I was planning on including these changes in a later patch, since the overall scope of this one is so minimal, I wanted to avoid changing many other files. 
> > This is brand new code. You should really avoid duplicating code in new code, if at all possible. In other words, you should be refactoring the code you want to use in earlier patches, to make it shareable, and then base this patch on top of them.
> Regarding the code duplication, where should these shared functions be moved to. I was thinking about ObjectFile.cpp, but I am open to any suggestions. 
`getElfSymbolType` sounds like it belongs in ELF.h or ELFObjectFile.h. `collectBBtoAddressLabels` probably doesn't belong in `ObjectFile.h` simply because it doesn't involve any use of `ObjectFile`, but I can't see a better location, so there's probably fine, or maybe even consider a new header.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153376/new/

https://reviews.llvm.org/D153376