[PATCH] D123869: [llvm-profgen] Add process filter for perf reader

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 16 21:00:22 PDT 2022


wenlei added inline comments.


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:535
+  static std::unique_ptr<PerfReaderBase>
+  create(ProfiledBinary *Binary, PerfInputFile &PerfInput, uint32_t PIDFilter);
 
----------------
wenlei wrote:
> hoy wrote:
> > hoy wrote:
> > > wenlei wrote:
> > > > hoy wrote:
> > > > > Can we just check against the switch `ProcessId` in `convertPerfDataToTrace`? Feel like PID is perf data specific and changing the general interface here may not be necessary. We have other filters that are for a particular reader, as an example `ignore-stack-samples`.
> > > > > Can we just check against the switch ProcessId in convertPerfDataToTrace?
> > > > 
> > > > We also need to check PID in updateBinaryAddress. I think PID is in the same category as Binary, both are filters for perf events. 
> > > > 
> > > > > Feel like PID is perf data specific and changing the general interface here may not be necessary. 
> > > > 
> > > > Isn't any perf reader type under PerfReaderBase supposed to be perf data specific? The only thing that is not perf data related is UnsymbolizedProfileReader, but then I think the real problem is really why we have UnsymbolizedProfileReader inheriting from perf reader while it's clearly not reading perf data/script.
> > > > We also need to check PID in updateBinaryAddress.
> > > 
> > > The PID filter isn't that meaningful in perf script file. Though we can still use pid to filter out the mmap events there, but the main LBR/stack samples mixed from multiple processes are not separable. So PID filter should be applied when converting perf.data to perf script, which makes me think it is perf data specific.
> > > 
> > > > Isn't any perf reader type under PerfReaderBase supposed to be perf data specific? 
> > > 
> > > Not necessarily. Perf script and our internal lbr profile do not see 
> > > everything that perf.data has. The two formats assume all data in them are for one process. Do you see with pid filter, `convertPerfDataToTrace` still gives mixed mmap evens from all processes?
> > > 
> > >   
> > I see, looks like mmap events are special from other sample events in that they cannot be filtered by pid. I was hoping that perf script output could be made dedicate to a process since that's what `perf script --pid` is shooting for? Once filtered, the processing of perf script output should no longer need to deal with pid again. This sounds like a bug of perf script.
> > 
> > The `PerfReaderBase` hierarchy seems improper in naming. Both perf.data and perf script output are processed by `PerfScriptReader` and the base class `PerfReaderBase` is there only to provide an interface to create a specific reader instance, including `UnsymbolizedProfileReader`. I think the intention of including `UnsymbolizedProfileReader` in the hierarchy was to avoid a special path in the driver, but then `PerfReaderBase` should be named `ReaderBase` or something. 
> > 
> > It then makes sense to pass pid to the perf reader hierarchy and separating `UnsymbolizedProfileReader` out. The separation can be done in a different patch. 
> > 
> > Use `optional<uint32_t>` for `PIDFilter` in case zero is a valid kernel process?
> > > We also need to check PID in updateBinaryAddress.
> > 
> > The PID filter isn't that meaningful in perf script file. Though we can still use pid to filter out the mmap events there, but the main LBR/stack samples mixed from multiple processes are not separable. So PID filter should be applied when converting perf.data to perf script, which makes me think it is perf data specific.
> 
> We need to use pid to filter mmap event, otherwise we don't know which mmap to use to set base address and the result would be wrong, so pid is meaningful for perf script. 
> 
> > 
> > > Isn't any perf reader type under PerfReaderBase supposed to be perf data specific? 
> > 
> > Not necessarily. Perf script and our internal lbr profile do not see 
> > everything that perf.data has. The two formats assume all data in them are for one process. 
> >   
> 
> Maybe I didn't understand what you're trying to say. I don't see a problem because: 1) there's no fundamental difference between perf data and perf script; 2) By the name `PerfReaderBase`, the entire hierarchy is perf specific; 3) then adding pid to general interface isn't a problem. 
> 
> Also I don't see pid is being very different from binary name, it's another filter, similar to binary name. 
> 
> > Do you see with pid filter, `convertPerfDataToTrace` still gives mixed mmap evens from all processes?
> 
> Yes, if you are asking about output from `perf script ..` with pid filter. mmap from processes are still there. 
> 
> I was hoping that perf script output could be made dedicate to a process since that's what perf script --pid is shooting for? 

No, as I said that's what you get with perf script. I added the change in updateBinaryAddress after I realized that. 

> This sounds like a bug of perf script.

I'm not sure, unless there's a clear spec this is basically implementation dependent. As you mentioned, mmap can be considered special.

> I think the intention of including UnsymbolizedProfileReader in the hierarchy was to avoid a special path in the driver, but then PerfReaderBase should be named ReaderBase or something.

I don't think make non-perf reader a subclass of perf reader is a good idea, and avoiding special path in driver doesn't justify that. 

> The PerfReaderBase hierarchy seems improper in naming.

The hierarchy is improper, but I'm not sure if it's just a naming thing. One can also argue perhaps it's better to have special path in driver like `llvm-sample-profile`. It was a similar situation and we settled on having special path there. Regardless this is unrelated to this patch. 

> Use optional<uint32_t> for PIDFilter in case zero is a valid kernel process? 

I'd prefer simplicity unless proven necessary. I don't think zero process id is something we'd worry about. But I don't have strong opinion on this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123869/new/

https://reviews.llvm.org/D123869



More information about the llvm-commits mailing list