[PATCH] D150370: Introduce StructuredData

Tue Jun 20 23:29:07 PDT 2023

nhaehnle marked 2 inline comments as done.
nhaehnle added a comment.

In D150370#4435919 <https://reviews.llvm.org/D150370#4435919>, @nikic wrote:

> In D150370#4435746 <https://reviews.llvm.org/D150370#4435746>, @nhaehnle wrote:
>
>>> More generally, I'm not happy that this new concept is being introduced as part of the target type implementation. This doesn't really seem helpful for this specific use case (it makes the implementation substantially more complex rather than simpler). It may well make sense as part of some larger context, but I think this larger context deserves a wider discussion (probably on discourse) to clarify what the goals of this abstraction are and make sure we have a good design for it.
>>
>> Sure. It's a bit of a chicken-and-egg thing. I do think this approach is better than the piecemeal addition of things to bitcode reader/writer, which is rather subtle code that I don't think too many people understand, but at the same time, working on big changes is frustrating unless you have clear line-of-sight to actually getting something useful upstream. So I started with this small thing quite consciously on purpose.
>>
>> I have a WIP change locally that leverages the same infrastructure for "extended metadata", and at least an early minimal version that does **something** interesting is something that I think I can put up reasonably soon so that the discussion doesn't end up overly abstract. I would also like to do the same thing for "extended instructions", but that's still a bit further out.
>
> FWIW, I do think this is pretty reasonable when seen as a pure IR/bitcode abstraction. I wouldn't have concerns if this were (initially at least) represented as `std::pair<StringRef, sdata::Value>` and only used as part of reading/writing. That use case doesn't really need any of the Symbol machinery -- we get these as strings in IR/bitcode anyway, and I don't think there is a substantial performance difference between interning the strings and then comparing IDs compared to directly comparing to a small handful of strings, during parsing only.
>
> The design you have makes a lot more sense if the sdata is supposed to be used as part of the in-memory IR as well, in which case the Symbols are important for more efficient access -- but I'm not sure if they are ideal in that context either. That would still require that we have Symbols and Values (std::variant, ugh) in the representation. Ideally we'd just have a C++ struct with properly typed members, and sdata only being used for the purposes of serializing/deserializing those structs using a general mechanism, without requiring new asm/bitcode support each time.

Hmm, perhaps that has been a case of premature optimization.

Indeed my intention is that "known" extended structures (in the general sense) are represented as C++ structs with properly typed members in live IR. The only exception are "generic" extended structures which are used as a fallback in-memory representation to preserve as a black-box any extension structures that are unknown. That is, in our graphics compiler my current thinking is for us to have extended metadata objects along the lines of:

  !lgc.rasterizer.state { discardEnable: i1 true, perSampleShading: i1 false, ... }

In our compiler, these would be represented as an instance of an `lgc::RasterizerStateMetadata` class which is derived from an `llvm::ExtMetadata` base class and contains these fields as plain data. But when we write out intermediate IR with something like `-stop-after` and then run it through generic tools like `opt`, `llvm-dis`, etc., the same metadata object is represented by a `llvm::GenericExtMetadata` which just holds an array of `std::pair<sdata::Symbol, sdata::Value>`. But perhaps that could just be a `std::pair<std::string, sdata::Value>` instead because nobody really accesses these fields anyway.

I'll think about this some more but removing the Symbol stuff and just replacing by StringRef is starting to sound reasonable.

================
Comment at: llvm/include/llvm/IR/StructuredData.h:25
+class LLVMContext;
+class MDNode;
+class Type;
----------------
nikic wrote:
> Unused
(only MDNode is truly unused, removing that)

================
Comment at: llvm/include/llvm/IR/StructuredData.h:176
+/// This is used to describe structures for bitcode abbreviation.
+class SchemaField {
+public:
----------------
nikic wrote:
> Not used in this patch -- move to the last one?
Sure

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150370/new/

https://reviews.llvm.org/D150370