[PATCH] D158675: [NFC][AsmPrinter] Refactor DbgVariable as a std::variant

Felipe de Azevedo Piovezan via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 6 12:04:18 PDT 2023


fdeazeve added inline comments.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h:149
+struct MMILoc {
+  mutable SmallVector<FrameIndexExpr, 1> FrameIndexExprs;
+};
----------------
scott.linder wrote:
> fdeazeve wrote:
> > scott.linder wrote:
> > > fdeazeve wrote:
> > > > Not for this patch, but it really bothers me that this is mutable. The original rationale must have been to allow sorting when we call the "get" method, but still...
> > > It is almost spooky how you've pointed out the same things I toyed with while working on the patches originally!
> > > 
> > > I had a patch in the middle somewhere which just exposed the need to sort in the interface, but scrapped it early on. I think your `set` idea makes a lot of sense and I'll follow up with a patch for that as well.
> > One thing to consider here: sorting (either in the current vector approach or through the internal impl of std::set) require calling `Expr->getFragmentInfo()->OffsetInBits` as many time as comparison operations are performed, and this is possibly done multiple times for the same expression without any caching.
> > 
> > Something to keep in mind as a possible improvement as well (perhaps computing & storing the offset in advance)
> I tried to do a bit of (non-statistically-rigorous) benchmarking of a few approaches.
> 
> The versions are:
> 
> * main: unmodified main branch
> * patched+uncached: most straightforward switch to std::set, without any caching of the fragment info
> * patched+eager_cached: above, but assume the `OffsetInBits` will be needed, and cache it in the constructor of `FrameIndexExpr`
> * patched+lazy_cached: above, but re-introduce a `mutable` member so the comparison operator can cache the `OffsetInBits` on first use
> 
> The input is a precodegen dump of llvm-tblgen from a FullLTO build of main. Not sure if this is really representative, but it at least contains multi-fragment MMI entries.
> 
> If there is a better way to do the benchmark I am happy to redo it, this is just the best I could come up with that would take a reasonable amount of time.
> 
> ```
> llc -O3 build-lto/bin/llvm-tblgen.0.5.precodegen.bc -o /dev/null -filetype=obj -time-passes
> 
>  ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
> 
> main:
> 22.3635 ( 28.8%)  14.2489 ( 71.7%)  36.6124 ( 37.5%)  36.6142 ( 37.5%)  X86 Assembly Printer
> 21.9819 ( 28.1%)  14.6388 ( 71.8%)  36.6207 ( 37.2%)  36.6238 ( 37.2%)  X86 Assembly Printer
> 
> patched+uncached:
> 21.3484 ( 28.0%)  14.4208 ( 72.4%)  35.7693 ( 37.2%)  35.7706 ( 37.2%)  X86 Assembly Printer
> 21.4563 ( 28.2%)  14.4606 ( 72.2%)  35.9169 ( 37.4%)  35.9191 ( 37.4%)  X86 Assembly Printer
> 
> patched+eager_cached:
> 21.2332 ( 32.0%)  14.3204 ( 76.2%)  35.5536 ( 41.8%)  35.5555 ( 41.8%)  X86 Assembly Printer
> 21.6691 ( 31.7%)  14.5448 ( 74.4%)  36.2139 ( 41.2%)  36.2149 ( 41.2%)  X86 Assembly Printer
> 
> patched+lazy_cached:
> 22.4542 ( 27.9%)  14.9983 ( 71.0%)  37.4525 ( 36.9%)  37.4537 ( 36.9%)  X86 Assembly Printer
> 22.9115 ( 28.3%)  14.7169 ( 69.3%)  37.6284 ( 36.8%)  37.6299 ( 36.8%)  X86 Assembly Printer
> ```
> 
> I only ran 2 trials so there is a lot of room for variance, but it seems like everything other than patched+eager_cached is more-or-less comparable when comparing User Time as a percent of compilation time. This makes some sense to me, as I suspect the typical case is to not have multiple fragments anyway, so we would never need to compare the offsets, turning the eager caching into pure overhead.
> 
> I think for the sake of simplicity in the code it is not worth having the `mutable` cache at all. If you agree I can post a patch! If this doesn't seem convincing I can do a more rigorous comparison.
Oh wow, thanks for doing all of this! Indeed, I agree with you that the complexity is not worth it based on these numbers. I'd say we can start with the simplest implementation and work from there if this becomes a bottleneck.

By the way, we don't need to do this as part of the existing patch series, it seems reasonable to merge the refactors first!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158675/new/

https://reviews.llvm.org/D158675



More information about the llvm-commits mailing list