[PATCH] D111750: [llvm-profgen] Allow unsymbolized profile as perf input
Lei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 22 11:00:20 PDT 2021
wlei added a comment.
In D111750#3070920 <https://reviews.llvm.org/D111750#3070920>, @hoy wrote:
> Thanks for bringing up the support of raw profile as input. Can you please describe the format of raw profile in the summary and somewhere in the code (`RawProfileReader`) ?
Sounds good. added the format to the head of class.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:310
std::string PerfReaderBase::convertPerfDataToTrace(ProfiledBinary *Binary,
StringRef PerfData) {
----------------
wenlei wrote:
> How about we use `PerfInputFile PerfReaderBase::convertPerfDataToTrace(ProfiledBinary *Binary, PerfInputFile &PerfInput`), and return a new PerfInput from the function?
Fixed.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:732-739
+inline void exitWithErrorForTraceLine(TraceStream &TraceIt) {
+ std::string Msg = TraceIt.isAtEoF()
+ ? "Invalid raw profile!"
+ : "Invalid raw profile at line " +
+ Twine(TraceIt.getLineNumber()).str() + ": " +
+ TraceIt.getCurrentLine().str();
+ exitWithError(Msg);
----------------
wenlei wrote:
> Can we move this a lambda instead of global function since it's only used in readSampleCounters?
>
> If lambda grows too big, we can use member functions too.
Moved to a lambda.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:741
+
+void RawProfileReader::readSampleCounters(TraceStream &TraceIt,
+ SampleCounter &SCounters) {
----------------
wenlei wrote:
> Some comment showing the expected input for parsing would be helpful.
comments added
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:771
+
+ if(!UseOffset) {
+ Source = Binary->virtualAddrToOffset(Source);
----------------
wenlei wrote:
> If we use this flag to control profile reading too, we need to update flag description.
description updated
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:797
+ }
+ Key->genHashCode();
+ auto Ret =
----------------
wenlei wrote:
> Would it better if hash code is lazily generated instead of requiring an explicit call?
>
> ```
> getHashCode() {
> if (HashCode == 0)
> genHashCode()
> ...
> }
> ```
Here to explicitly call `genHashCode` is intentional, the reason is we want to avoid making `genHashCode` a virtual function, i, e, avoid call `genHashCode `after casting to base class. so we separate like:
```
derived class: HashCode = genHashCode();
base class : getHashCode{return HashCode;}
```
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:807-808
+
+ if (SkipSymbolization)
+ writeRawProfile(OutputFilename);
+}
----------------
wenlei wrote:
> So if user specify skip-symbolization and the input is raw profile, we are just reading it in and then writing it out without doing anything?
Yes, otherwise user give `skip-symbolization` with unsymbolized profile, it won't have output. And I actually used it during debugging or maybe in the future we can do something after read then output it.
Added a warning to this
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:63
+enum PerfInputType {
+ PERFSCRIPT_UNKNOWN = 0,
+ PERFSCRIPT_INVALID = 1,
----------------
wenlei wrote:
> wenlei wrote:
> > nit: all caps is usually only used for macro. For enum, it's better to follow normal variable naming, which is also the convention used in llvm
> These are not mutually exclusive types. e.g. PERFSCRIPT_UNKNOWN can be either PERFSCRIPT_LBR or PERFSCRIPT_LBR_STACK.
>
> Logically there're two things being represented, the format and the content. To avoid exponential combination and to have a clear separation, perhaps use format, content masks instead, or even separate them out?
>
> We can have format = RawProfile|PerfScript|PerfData, and content = LBR|LBRStack|Aggregated.
Fixed, thanks!
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:63-68
+ PERFSCRIPT_UNKNOWN = 0,
+ PERFSCRIPT_INVALID = 1,
+ PERFSCRIPT_LBR = 2, // Only LBR sample
+ PERFSCRIPT_LBR_STACK = 3, // Hybrid sample including call stack and LBR stack.
+ PEFFDATA = 4, // Raw linux perf.data.
+ RAWPROFILE = 5, // Unsymbolized profile generated by llvm-profgen.
----------------
wlei wrote:
> wenlei wrote:
> > wenlei wrote:
> > > nit: all caps is usually only used for macro. For enum, it's better to follow normal variable naming, which is also the convention used in llvm
> > These are not mutually exclusive types. e.g. PERFSCRIPT_UNKNOWN can be either PERFSCRIPT_LBR or PERFSCRIPT_LBR_STACK.
> >
> > Logically there're two things being represented, the format and the content. To avoid exponential combination and to have a clear separation, perhaps use format, content masks instead, or even separate them out?
> >
> > We can have format = RawProfile|PerfScript|PerfData, and content = LBR|LBRStack|Aggregated.
> Fixed, thanks!
Makes sense.Changed to separate them out.
================
Comment at: llvm/tools/llvm-profgen/llvm-profgen.cpp:39
+cl::opt<std::string>
+ RawProfileFilename("raw-profile", cl::value_desc("raw-profile"),
+ cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated,
----------------
wenlei wrote:
> I'm not sure if raw profile is a good name. the real raw profile is the perf.data, after some processing we get perf script, and with more processing we get this "raw profile". Perhaps we need a alternative name, unsymbolized profile?
yeah, it's clearer to use `--unsymbolized-profile`, changed. The name is a little long then, so add an alias to it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111750/new/
https://reviews.llvm.org/D111750
More information about the llvm-commits
mailing list