[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

Fri Apr 4 04:28:15 PDT 2025

uweigand wrote:

> I think that splitting the SD/ED/LD into 3 "section"s implies that a MCSectionGOFF has a fundamentally different semantic than the other MCSectionXXX. This is something I would like to avoid. On the other hand, the SD/ED pair is almost the same as an ELF section, so just putting those 2 into a MCSectionGOFF instance and handling the LD/PR symbol differently makes sense.

Thinking a bit more about this, it looks to me that we should treat SD/ED/PR on the one hand differently from LD (and ER) on the other.  The former identify a *range* of address space and may hold contents of those ranges in the form of text records; the latter identify a single address (and hold no content of their own).

>From that perspective, the former correspond to the "section" concept, while the latter correspond to the "symbol" concept.   Now, among the section types SD/ED/PR, GOFF is a bit special in that those are nested - this is somewhat similar to the subsection concept, but it is explicit in the object file format (as opposed to, say, ELF subsections).

It seems to me that modelling that nested section concept explicitly by creating a separate MCSectionGOFF for each of SD, ED, and PR, and linking them as appropriate via a `Parent` pointer (which we actually already have today!), doesn't look too fundamentally different ...   As long as we ensure that text emission happens into the right section (ED or PR as appropriate), this should work fine with common code.

In fact, considering that at some point we want to be able to implement a general HLASM AsmParser, which would require handling any allowed combination of CSECT with multiple CATTR, we should *not* merge SD and ED into a single section.  (Also, by having them separately, we no longer need special treatment of the "root" SD in the writer.)

Finally, having separate MCSession structures for each ESD record may allow using the MCSession::Ordinal field as the ESD ID, which matches its purpose for other object file formats, and which would allow easy resolution of parent (and ADA) section pointers to ESD IDs in the writer.

The LD record, on the other hand, clearly should *not* get a MCSectionGOFF.  Rather, it would make sense for this to be represented as a MCSymbolGOFF.   Specifically, this symbol really represents the implicit section start symbol (which ELF also has!); so it should probably best be emitted not from the section table but from the symbol table.  (MCSection already has a `Begin` symbol - it should be possible to use this for that purpose.)   That would also unify emission of that type of LD record with the other LD records for "normal" symbols.

Attributes associated with the LD record should likewise come from the MCSymbolGOFF.  This would include the ADA section, which means that association no longer needs to be hard-coded in the writer, but can instead set up by codegen as appropriate when defining symbols.  (E.g. this would also allow handling arbitrary user-provided XATTR PSECT attributes in an HLASM AsmParser.)

https://github.com/llvm/llvm-project/pull/133799