[PATCH] D89707: [AutoFDO][llvm-profgen] Parse mmap events from perf script

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 22 10:14:44 PDT 2020


wlei added inline comments.


================
Comment at: llvm/docs/CommandGuide/llvm-profgen.rst:14
+
+The :program:`llvm-profgen` utility generates a SPGO profile data file from 
+given perf script data files.
----------------
wmi wrote:
> Please explain what SPGO represents here.
@wmi Thanks for the helpful list of feedbacks. More description is added here



================
Comment at: llvm/docs/CommandGuide/llvm-profgen.rst:35
+
+  Path of the input profiled binary files. If no file path is specified, the
+  path of the actual profiled binaries will be used instead. 
----------------
hoy wrote:
> shenhan wrote:
> > Sometimes, the profiled binary path as recorded in mmaps is different from the local binary's path. I understand then this will only use the "name" part to match.
> > But sometimes the name part is also different, then the match cannot proceed.
> > 
> > One possible improvement is when the binary provided has embed "build-id", this can query perfdata file to get the absolute path of the file with matching build id, and use that absolute path to match mmap events and this would be the most accurate match, what do you think?
> Thanks for the suggestion. A build-id list can be made as an additional input to the tool for an accurate lookup of the mmap events. 
@shenhan Thanks for your suggestion. Do you mean I need to do parsing the binary to get the "build-id"(if it has), then take it as a key to query the perfdata in which it bonds the absolute patch with the build-id? If my understanding is right, I will try to do it later change. 


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:27
+static cl::list<std::string> PerfTraceFilenames(
+    "perfscript", cl::value_desc("perfscript"), cl::OneOrMore,
+    llvm::cl::MiscFlags::CommaSeparated,
----------------
mtrofin wrote:
> (flyby comment) is it a "perfscript" or a "perftrace" - or maybe "perfdata" rather.
> 
> Also, probably "perf tool" not "perf script" in the description.
@mtrofin Thanks for your helpful list of suggestions!
>(flyby comment) is it a "perfscript" or a "perftrace" - or maybe "perfdata" rather.
here we intentionally use "perfscript" but not "perfdata" because it's not the raw perfdata created by the `perf record` but the output by the `perf script -i perf.data`, this can avoid to call the cmd inside the code. 
> Also, probably "perf tool" not "perf script" in the description.
Here change the description of it



================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:34
+                    llvm::cl::MiscFlags::CommaSeparated,
+                    cl::desc("Input profiled binary files"));
+
----------------
mtrofin wrote:
> Path to profiled binary?
you mean to change the desc here? "Path" is added


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:49
+  std::string Path;
+  uint64_t BaseAddress;
+  bool IsLoaded;
----------------
mtrofin wrote:
> Nit: to help maintainability down the road, perhaps ensure primitive fields are initialized here - I realize they are init-ed in the ctor, but it's even easier to maintain things when def and init on the same line. Less typing for the ctor, too.
Fixed!


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:71
+using BinaryMap = StringMap<ProfiledBinary>;
+using BinaryAddressMap = std::map<uint64_t, ProfiledBinary *>;
+
----------------
wmi wrote:
> It is a map from address to ProfiledBinary, how about using AddressBinaryMap?
Good suggestion


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:74
+struct MMapEvent {
+  pid_t PID;
+  uint64_t BaseAddress;
----------------
mtrofin wrote:
> best to initialize struct fields to avoid undefined values.
fixed!


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:94
+    auto Buffer = std::move(BufferOrErr.get());
+    if (uint64_t(Buffer->getBufferSize()) >
+        std::numeric_limits<uint32_t>::max())
----------------
mtrofin wrote:
> should this rather be Buffer->getBufferSize() > static_cast<size_t>(std::numeric_limits<uint32_t>::max())?
> 
> 
Good catch


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:105
+      return true;
+    if (*P == '\r' && *(P + 1) == '\n')
+      return true;
----------------
mtrofin wrote:
> could P+1 be out of bounds?
Yeah, it seems it's not used here, delete this!


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:116
+    // The binary table is currently indexed by the binary name not the full
+    // binary path. This is because the user-given path may match the one that
+    // was actually executed.
----------------
wmi wrote:
> Here it is may or may not?
Good catch, fixed


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:128
+    }
+    ProfiledBinary &Binary = BinaryTable[BinaryName];
+    Binary.load();
----------------
wmi wrote:
> Maybe use BinaryTable.insert instead of BinaryTable.find above and use the return value to check whether the binary has been loaded before, then you don't have to search the table another time here.
Good to know this, fixed!


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:137
+    StringRef BinaryName = llvm::sys::path::filename(BinaryPath);
+    if (!BinaryTable.count(BinaryName))
+      return;
----------------
mtrofin wrote:
> I guess this can happen because the Event could reference other binaries than the interesting ones, correct? Could you add a comment here about this - i.e. that the event is intentionally dropped here.
> 
> 
Yes, comments added


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:143-144
+    // Update the binary address map.
+    BinaryAddrMap.erase(Binary.getBaseAddress());
+    BinaryAddrMap[Event.BaseAddress] = &Binary;
+
----------------
mtrofin wrote:
> wmi wrote:
> > Here BinaryAddrMap needs to erase an entry before inserting a new one. I guess it is to support the case that a load image is unloaded and then reloaded at a different place. If that is correct, please add some comment to make it clear. 
> Naive question: that assumes no interesting events were collected when the image was loaded at the old address?
yes , comments added


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:143-144
+    // Update the binary address map.
+    BinaryAddrMap.erase(Binary.getBaseAddress());
+    BinaryAddrMap[Event.BaseAddress] = &Binary;
+
----------------
wlei wrote:
> mtrofin wrote:
> > wmi wrote:
> > > Here BinaryAddrMap needs to erase an entry before inserting a new one. I guess it is to support the case that a load image is unloaded and then reloaded at a different place. If that is correct, please add some comment to make it clear. 
> > Naive question: that assumes no interesting events were collected when the image was loaded at the old address?
> yes , comments added
Not sure whether it will remain the same address, but here I think it will cover this case. or maybe you want me to check if the base address doesn't change then we can just drop this event?


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:174
+      MMapEvent Event;
+      Fields[1].getAsInteger(10, Event.PID);
+      Fields[2].getAsInteger(0, Event.BaseAddress);
----------------
mtrofin wrote:
> consider using an enum like:
> 
> enum EventIndex {
>   WholeLine = 0,
>   PID = 1,
>   BaseAddress = 2... (etc)
> }
> 
> so then dereferencing Fields is more readable.
fixed, thanks for the suggestion


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:206
+
+  void run() {
+    // Load the binaries.
----------------
mtrofin wrote:
> perhaps instead of 'run' something more specific, like "loadBinariesAndEvents".
Agree it's better to have a specific name. My concern is we do not only load binaries and mmp events here, we must also do parsing the sample line simultaneously to extract info for the LBR unwinder(see its children changes), also do check the perfscript type, so I just use a general name as an entry of those misc. Any thoughts about this?


================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:222
+
+int main(int argc, const char *argv[]) {
+  InitLLVM X(argc, argv);
----------------
mtrofin wrote:
> probably can be done later just fine, but should the main be in its own file, so then this becomes a reusable library?
Yeah, we separate PerfReader to other files in the following commit, please see its children changes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89707/new/

https://reviews.llvm.org/D89707



More information about the llvm-commits mailing list