[llvm-dev] Question about opt-report strings

Mon Jan 6 15:11:47 PST 2020

Thanks, Francis. I'll try to put something together to get this started.

-Andy

-----Original Message-----
From: Francis Visoiu Mistrih <francisvm at yahoo.com> 
Sent: Monday, January 06, 2020 2:14 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>; Adam Nemet <anemet at apple.com>
Subject: Re: Question about opt-report strings

Hi Andy,

> On Jan 6, 2020, at 12:51 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:
> 
> Hi all,
>  
> I tried to poke my head into opt-report a while ago and didn’t get very far. Now I’m looking at it again. I’m not sure I understand everything that’s in place so my question here may be misguided.
>  
> I’m trying to understand the way strings are handled. When a remark is emitted, it seems that the string is constructed on the fly based on streaming inputs. For example,
>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, "LoadElim", LI)
>            << "load of type " << NV("Type", LI->getType()) << " eliminated"
>            << setExtraArgs() << " in favor of "
>            << NV("InfavorOfValue", AvailableValue);
>   });
>  
> There is some C++ magic going on behind the scenes here, and it makes for a nice interface, but I’m not clear about what ends up being stored where. I think within DiagnosticInfoOptimizationBase all the string parts of this get stored in a vector of name-value pairs with the unnamed strings just having an empty name.

That’s correct. There is a struct DiagnosticInfoOptimizationBase::Argument that has a key-value pair and a debug location (used for things like remarks in the inliner to point to the callee’s source location). Unnamed strings have the key “String”:

> --- !Passed
> Pass:            gvn
> Name:            LoadElim
> Function:        arg
> Args:
> - String:          'load of type '
> - Type:            i32
> - String:          ' eliminated'
> - String:          ' in favor of '
> - InfavorOfValue:  i
> ...

> At some point, I guess this gets assembled into a single string?

It does in DiagnosticInfoOptimizationBase::getMsg if needed (probably when using -Rpass?). When it’s serialized to a file, it’s serialized as multiple key-value “arguments” that can be concatenated later by the client, or consumed based on the meaning of the key.

> I’ve also found references to string tables for the bitstream serializer and a YAML format that uses a string table, but I’m not clear how and when these are constructed.

The serialization part is handled by all the stuff in lib/Remarks. lib/IR/RemarkStreamer.cpp basically converts LLVM diagnostics (DiagnosticInfoOptimizationBase) to remarks::Remark objects that are used for both serializing and deserializing the remarks in all the various formats. The main reason is to allow any remark producer to be independent from LLVM diagnostics which are tied to LLVM (M)IR.
When used, the string table is kept in memory until the AsmPrinter, which emits it in a section in the object file, along with some other metadata. The YAML format with a string table is usable but was mainly put there to start working on the whole remark layer before the bitstream-based format was ready. More details on the various formats here: https://llvm.org/docs/Remarks.html.

>  
> What I’m wondering is whether it would make sense to introduce a sort of message catalog, similar to the way diagnostics are handled in clang (which I must admit I also have only a partial understanding of). It seems like the RemarkName for optimization remarks somewhat serves as a unique identifier (?) but I would think an integer value of some sort would be better, so maybe I’m misunderstanding what RemarkName is being used for. I’m imagining something that would end up looking like this:

I believe the RemarkName + the PassName should be unique, but there is nothing documenting this as such, nor any checks enforcing it.

>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, diag::remark_gvn_load_elim, LI)
>            << NV("Type", LI->getType())
>            << setExtraArgs() << NV("InfavorOfValue", AvailableValue);
>   });
>  
> with a tablegen file somewhere containing this:
>  
> def remark_gvn_load_elim: OptRemark<
>   “LoadElim”,                    // RemarkName (if this is needed for YAML output or whatever)
>   "load of type %0 eliminated",  // Base format string for the remark (%Type instead of %0 maybe?)
>   "in favor of %1">;             // Extra args format string for verbose output
>  
>  
> Has this been discussed before?

This would be great! I was planning on bringing up something like this but never really got the time to get into it.

I would also add the pass somewhere in the remark definition (although it may be annoying to keep it updated with every single DEBUG_TYPE).

This will be very useful for documenting all the remarks and to provide a nicer way of filtering them.

I’d be happy to review this!

Thanks,

— 
Francis

>  
> Thanks,
> Andy