[PATCH] D158675: [NFC][AsmPrinter] Refactor DbgVariable as a std::variant
Scott Linder via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 6 11:55:51 PDT 2023
scott.linder added inline comments.
================
Comment at: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h:149
+struct MMILoc {
+ mutable SmallVector<FrameIndexExpr, 1> FrameIndexExprs;
+};
----------------
fdeazeve wrote:
> scott.linder wrote:
> > fdeazeve wrote:
> > > Not for this patch, but it really bothers me that this is mutable. The original rationale must have been to allow sorting when we call the "get" method, but still...
> > It is almost spooky how you've pointed out the same things I toyed with while working on the patches originally!
> >
> > I had a patch in the middle somewhere which just exposed the need to sort in the interface, but scrapped it early on. I think your `set` idea makes a lot of sense and I'll follow up with a patch for that as well.
> One thing to consider here: sorting (either in the current vector approach or through the internal impl of std::set) require calling `Expr->getFragmentInfo()->OffsetInBits` as many time as comparison operations are performed, and this is possibly done multiple times for the same expression without any caching.
>
> Something to keep in mind as a possible improvement as well (perhaps computing & storing the offset in advance)
I tried to do a bit of (non-statistically-rigorous) benchmarking of a few approaches.
The versions are:
* main: unmodified main branch
* patched+uncached: most straightforward switch to std::set, without any caching of the fragment info
* patched+eager_cached: above, but assume the `OffsetInBits` will be needed, and cache it in the constructor of `FrameIndexExpr`
* patched+lazy_cached: above, but re-introduce a `mutable` member so the comparison operator can cache the `OffsetInBits` on first use
The input is a precodegen dump of llvm-tblgen from a FullLTO build of main. Not sure if this is really representative, but it at least contains multi-fragment MMI entries.
If there is a better way to do the benchmark I am happy to redo it, this is just the best I could come up with that would take a reasonable amount of time.
```
llc -O3 build-lto/bin/llvm-tblgen.0.5.precodegen.bc -o /dev/null -filetype=obj -time-passes
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
main:
22.3635 ( 28.8%) 14.2489 ( 71.7%) 36.6124 ( 37.5%) 36.6142 ( 37.5%) X86 Assembly Printer
21.9819 ( 28.1%) 14.6388 ( 71.8%) 36.6207 ( 37.2%) 36.6238 ( 37.2%) X86 Assembly Printer
patched+uncached:
21.3484 ( 28.0%) 14.4208 ( 72.4%) 35.7693 ( 37.2%) 35.7706 ( 37.2%) X86 Assembly Printer
21.4563 ( 28.2%) 14.4606 ( 72.2%) 35.9169 ( 37.4%) 35.9191 ( 37.4%) X86 Assembly Printer
patched+eager_cached:
21.2332 ( 32.0%) 14.3204 ( 76.2%) 35.5536 ( 41.8%) 35.5555 ( 41.8%) X86 Assembly Printer
21.6691 ( 31.7%) 14.5448 ( 74.4%) 36.2139 ( 41.2%) 36.2149 ( 41.2%) X86 Assembly Printer
patched+lazy_cached:
22.4542 ( 27.9%) 14.9983 ( 71.0%) 37.4525 ( 36.9%) 37.4537 ( 36.9%) X86 Assembly Printer
22.9115 ( 28.3%) 14.7169 ( 69.3%) 37.6284 ( 36.8%) 37.6299 ( 36.8%) X86 Assembly Printer
```
I only ran 2 trials so there is a lot of room for variance, but it seems like everything other than patched+eager_cached is more-or-less comparable when comparing User Time as a percent of compilation time. This makes some sense to me, as I suspect the typical case is to not have multiple fragments anyway, so we would never need to compare the offsets, turning the eager caching into pure overhead.
I think for the sake of simplicity in the code it is not worth having the `mutable` cache at all. If you agree I can post a patch! If this doesn't seem convincing I can do a more rigorous comparison.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D158675/new/
https://reviews.llvm.org/D158675
More information about the llvm-commits
mailing list