[clang] [C++20] [Modules] Introduce reduced BMI (PR #75894)

Thu Jan 4 02:09:08 PST 2024

ChuanqiXu9 wrote:

> > Like I said in the commit message, this patch itself doesn't involve anything relevant to user interfaces. I left it to the latter patches.
> 
> Are you in a position to post the next patch (at least as a draft)? That would help me see the direction.

I post it here https://github.com/ChuanqiXu9/llvm-project/commit/efcc7e888a96e9935a354759de320dea55e93105 since I didn't get how to send stacked pr in github yet. But I guess it might be sufficient since it is really in an early phase.

> 
> > > * I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?
> > 
> > 
> > I am not sure if I understand this. What does it mean "require a codegen back end to be instantiated to allow the reduced BMI "? Do you mean to not touch CodeGen part or to not touch the CodeGen action? My local patch touched the code gen action without touching any real CodeGen related things.
> 
> > > the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs)
> > 
> > 
> > For `--precompile/-fmodule-only` type jobs, I'll create another action to make it (Similar to existing `GenerateModuleInterfaceAction`). Then both of the actions will try to reuse the same consumer `ReducedBMIGenerator` to avoid repeated works.
> 
> OK, that answers my concern (which was that we might have to add the code-gen backend to a --precompile if that was the mechanism used to do the BMI reduction).
> 
> > > * It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side.  It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.
> > 
> > 
> > Yeah, it should be less concerned. BTW, currently the simpler serializer/deserializer should be ASTRecordWriter/ASTRecordReader. And the current ASTReader/ASTWriter takes some semantical job.
> 
> ... and, on the reader side, that already gives us some big problems (as I say, I am less concerned on the writer side, but who can see the whole future?).

I guess you're referring the process how we decide a declaration is visible?

> 
> > > so something like
> > > ```
> > > raw AST +======== > codegen (when required)
> > >         |
> > >         + =====> AST transforms ====> BMI output.
> > > ```
> > > 
> > > 
> > >     
> > >       
> > >     
> > > 
> > >       
> > >     
> > > 
> > >     
> > >   
> > > As I understand the patch you are combining the transform with the output?
> > 
> > 
> > On the one hand, the **current** patch doesn't do that. The current patch is almost a NFC patch. It belongs to the following patch. On the other hand, the answer may be yes. Probably I did the `AST transforms` you said in the AST Writer. I don't feel it is so awful.
> 
> Maybe not for the short-term relatively simple tasks - but we should also take a view on the medium and longer term (for example, GMF decl elision is likely to be helpful to users in reducing both the size of the BMI and the number of decls that need merging on input).

For GMF decl elision, I posted a patch to implement it in ASTWriter and I reposted it in https://github.com/llvm/llvm-project/pull/76930. The big problem is that this is not formal. (Just for sharing, I am not proposing this now)

> 
> We need the AST in this path to be mutable (including removal of decls); that way each transform can be maintained as a separate entity - I think that if we end up doing "many transforms" as part of the output, it will become very confusing.

While the model sounds good, I am pessimistic for making it correctly, completely, and efficiently.

> 
> _(although, to be clear, i**n the short-term we might agree to do the work in the output** - I really do think it would be bad to make that the long term mechanism)._

Not against, I just think it is not so bad. There already many optimizations in the current serializations.

> 
> > Given all of us loves reduced BMI, I suggest we can fosus on current patch then discuss user interfaces related things in the next patch after this got landed.
> 
> We do all want to produce the reduced BMI, I agree; but we also always have limited resources to do the work, so that it would be good to try and pick an implementation that will be smooth for the future work too.

Understood. I just think it won't be too bad. Or it is not easy for us to get a much better solution in the resources we have. I prefer the style that don't make perfect the enemy of better.

> 
> I understand the purpose of the current patch better now - and will try to take a more detailed look over the next few days

I don't intend to land this in clang18. So we don't need to be hurry.

> - as noted above, it would help very much to have a preview of the next patch in the series.

Sent. But I just feel it is not so helpful for reviewing this patch : )

https://github.com/llvm/llvm-project/pull/75894