[PATCH] D111414: [Demangle] Add minimal support for D programming language

Tue Oct 12 00:28:46 PDT 2021

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/Demangle/Demangle.h:63

+// Demangles a D mangled symbol
+char *dlangDemangle(const char *MangledName);
----------------

================
Comment at: llvm/lib/Demangle/DLangDemangle.cpp:35
+///
+/// \note Beware these aren't required to be '\0' terminated
+struct OutputString {
----------------

================
Comment at: llvm/lib/Demangle/DLangDemangle.cpp:46
+public:
+  /// Constructs a new output string
+  OutputString();
----------------
More comments missing full stop throughout this class (but need to settle on the appropriate type before bothering too much).

================
Comment at: llvm/lib/Demangle/DLangDemangle.cpp:36
+/// \note Beware these aren't required to be '\0' terminated
+struct OutputString {
+
----------------
ljmf00 wrote:
> dblaikie wrote:
> > jhenderson wrote:
> > > dblaikie wrote:
> > > > ljmf00 wrote:
> > > > > dblaikie wrote:
> > > > > > Any chance of using an existing stream type (like `llvm::itanium_demangle::OutputStream` which `RustDemangle` also uses?)? Otherwise might be worth a bunch of separate testing of this class - or incrementally adding functionality to it that functionality is used in the patch series, otherwise it's hard to tell that everything's tested if it's added in one go here & significant parts are currently unused.
> > > > > This type differs from a stream because D demangler needs the ability to prepend to the output string due to how D demangling is designed. Because of that, a stream is not a good fit here and plus adding methods like prepend will make it conceptually not a stream at all.
> > > > > 
> > > > > I will incrementally add parts of it when necessary and add tests for it. Although, if you find any other data structure that might be suitable here, please let me know.
> > > > Ah, fair enough. Might want to check the naming conventions (I'd expect "need" to be called "reserve" in C++ API parlance, though perhaps in LLVM's APIs that's called "grow"?)/API design (do the other similar data structures have a "free()" function, or do they rely on the dtor to cleanup?) line up with the existing OutputStream/other data structures in LLVM for consistency.
> > > > 
> > > > & maybe worth pulling it out into a separate header - given how big this whole file is likely to get?
> > > I wonder if we could model this as some kind of `deque<std::string>` - basically buffer things by adding strings to the start or end of the queue (prepend and append), before finalizing it into a single string at a later point. It might help avoid a lot of manual memory management/copying etc.
> > Probably would be a `deque<char>`? Though that'd then involve copying the result into the output buffer - so I think the current approach is probably a/the good one, and consistent with the other demanglers that write directly into their output buffer. Just with the added constraint of needing to be able to prepend info.
> > 
> > (another alternative would be to generalize the existing itanium OutputStream to support prepending too)
> I think a deque is not a good fit for this for several reasons. I think you were referring to some sort of `deque<char*>`, otherwise it would be very memory inefficient -- the deque node structs are way heavier than a byte.
> 
> Even with a string deque memory not being contiguous across nodes can also hurt performance, when, e.g. calculating the length of a given deque. Another problem with that is the fact that memory allocation is done in chunks and gown twice when needed.
> 
> Also, assuming that a string is appended/prepended to the deque on every OutputString::append or OutputString::prepend, allocating small chunks individually is also incredibly slow -- if that is not the case I'm not seeing a way where it is applicable at all.
> 
> Talking about the OutputStream, as I mentioned above, prepending is not part of a Stream concept (correct me if I'm wrong), since a stream is designed to only grow at the end. I'm not quite sure if something is relying on specific characteristics of a stream but, e.g. because a stream only grows at the end, adding a prepend would mess up with indices. If that is not a problem, I would consider that option but also renaming it.
Yeah, char* or string is what I was thinking, since if you don't need to go back and modify any of the already added string, you can just naturally add substrings at either end. You could also try a simple `std::list`, since that would avoid the overhead of reserving space completely. I've not really studied the performance implications of the various std containers though, so I can't really advise what's the most appropriate overall. I'm more interested in the practicality of writing and maintaining the code (don't reinvent the wheel and all that, which I feel this class more-or-less is doing).

I'm not sure when you need to calculate the overall length, so I'm not sure that aspect is relevant?

================
Comment at: llvm/unittests/Demangle/DLangDemangleTest.cpp:25
+
+  ExpectedVal ExpectedArray[] = {
+      {"_Dmain", "D main"}
----------------
ljmf00 wrote:
> ljmf00 wrote:
> > jhenderson wrote:
> > > gtest has specific support for parameterised tests (see the `TEST_P` macro), so it probably makes more sense to leverage that rather than hand-roll your own looping logic. This is particularly important, because it impacts how test failures are reported.
> > Done
> I'm not too familiar with gtest, so extra guidance is appreciated. I'm not sure if an empty fixture is the best fit here, tho
If we end up moving to llvm-cxxfilt testing (and I don't really disagree with @dblaikie's points, so we probably should), this point is moot, as for a single test case, the parameterised testing is overkill. However, for future reference, I'd put the dlangDemangle and free calls in the SetUp and TearDown overrides in the fixture class, so that your test case detail becomes literally the EXPECT_STREQ line. Heck, you could probably get away with putting ALL the logic in the SetUp function, if I'm not mistaken and the TEST_P body then would be completely empty!

P.S. Don't forget to run clang-format on your newly added code.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111414/new/

https://reviews.llvm.org/D111414