[clang] [C++20] [Modules] Introduce reduced BMI (PR #75894)

Thu Jan 4 00:51:33 PST 2024

iains wrote:

> Like I said in the commit message, this patch itself doesn't involve anything relevant to user interfaces. I left it to the latter patches.

Are you in a position to post the next patch (at least as a draft)?  That would help me see the direction.

> > * I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?
> 
> I am not sure if I understand this. What does it mean "require a codegen back end to be instantiated to allow the reduced BMI "? Do you mean to not touch CodeGen part or to not touch the CodeGen action? My local patch touched the code gen action without touching any real CodeGen related things.

> > the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs)
> 
> For `--precompile/-fmodule-only` type jobs, I'll create another action to make it (Similar to existing `GenerateModuleInterfaceAction`). Then both of the actions will try to reuse the same consumer `ReducedBMIGenerator` to avoid repeated works.

OK, that answers my concern (which was that we might have to add the code-gen backend to a --precompile if that was the mechanism used to do the BMI reduction).

> > * It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side.  It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.
> 
> Yeah, it should be less concerned. BTW, currently the simpler serializer/deserializer should be ASTRecordWriter/ASTRecordReader. And the current ASTReader/ASTWriter takes some semantical job.

... and, on the reader side, that already gives us some big problems (as I say, I am less concerned on the writer side, but who can see the whole future?).

> > so something like
> > ```
> > raw AST +======== > codegen (when required)
> >         |
> >         + =====> AST transforms ====> BMI output.
> > ```
> > As I understand the patch you are combining the transform with the output?
> 
> On the one hand, the **current** patch doesn't do that. The current patch is almost a NFC patch. It belongs to the following patch. On the other hand, the answer may be yes. Probably I did the `AST transforms` you said in the AST Writer. I don't feel it is so awful.

Maybe not for the short-term relatively simple tasks - but we should also take a view on the medium and longer term (for example, GMF decl elision is likely to be helpful to users in reducing both the size of the BMI and the number of decls that need merging on input).

We need the AST in this path to be mutable (including removal of decls); that way each transform can be maintained as a separate entity - I think that if we end up doing "many transforms" as part of the output, it will become very confusing.

_(although, to be clear, i**n the short-term we might agree to do the work in the output** - I really do think it would be bad to make that the long term mechanism)._

> Given all of us loves reduced BMI, I suggest we can fosus on current patch then discuss user interfaces related things in the next patch after this got landed.

We do all want to produce the reduced BMI, I agree; but we also always have limited resources to do the work, so that it would be good to try and pick an implementation that will be smooth for the future work too.

I understand the purpose of the current patch better now - and will try to take a more detailed look over the next few days - as noted above, it would help very much to have a preview of the next patch in the series.

https://github.com/llvm/llvm-project/pull/75894