[clang] [C++20][Modules] Fix crash when function and lambda inside loaded from different modules (PR #109167)

Thu Sep 19 19:30:14 PDT 2024

ChuanqiXu9 wrote:

> > Would you like to explain more why this fail previously in more detail?
> 
> Original code in `ASTReader::finishPendingActions` looked like this:
> 
> ```
>     for (auto ID : PendingLambdas)
>       GetDecl(ID);
>     PendingLambdas.clear();
> ```
> 
> The issue here is that the code uses implicit iterators for `PendingLambdas`, but `GetDecl` may insert more elements into `PendingLambdas`, which can invalidate the iterators. In a good case, when there is no vector relocation, the new elements will be skipped. However, in the reproducer, the insertion caused the vector to relocate, resulting in a crash due to reading invalid values from deallocated memory. To address this issue, I have enclosed more than 5 lambdas to cause relocation in the new test cases.
> 
> In the new code, before reading the lambdas, I copy the existing lambdas to a new array. Alternatively, I could use an integer index for iteration and read the size of the vector on each iteration. Both approaches work fine, but I decided that running other pending actions might be better before starting to deserialize new lambdas. I can change it if you think iteration with an integer index is better.

Thanks, it is clear.

> 
> > Also I am thinking if we can make the process more efficiently: (1) Can we avoid the visitor in the writing process? (2) Can we delay the loading of lambdas to the load of definitions of the functions?
> > I immediately expectation is that (1) is possible but (2) may not be not safe.
> > For (1), my idea is to record/register the information during the writing of LambdaExpr in ASTWriterStmt and then we can write this record after we write all the decls and types. Then we need to read this record eagerly. Maybe we can optimize it further by adding a slot in FunctionDecl to record the offset of such informations. So we can avoid the eager linear reading process.
> 
> Yeah, it will complicate things a lot, we visit statements only after serializing FunctionDecl. I run some profiling and it seems that `collectLambdas` inclusively takes only about 0.25% cycles own time is almost 0. I'll try to reduce it even further. I think we can scopes instead of statements - it should be more efficient and we do it for enumerating anonymous definitions and it seems to be fast enough.

it won't be much more complicated. You only need:
(1) Add a new map in ASTWriter. 
(2) In ASTWriterDecl, when you are writing a CXXRecord you can judge if it is lambda and if its enclosing decl context is a first function decl. And if all the conditions are tree, insert the id of function decl and the record id to that map. (I thought we could do this in ASTWriterStmt, but it may be more straight forward in ASTWriterDecl)
(3) In `ASTWriter::WriteDeclAndTypes`, after we set `DoneWritingDeclsAndTypes = true`, there are plenty examples that we did similar things here (write the information recorded during writing). We can add a new logic below
(4) Convert the map to RecordData (a vector), and then we can emit the record.
(5) In the reader side, convert the readed RecordData into a map.
(6) When we reads the corresponding function (or other possible conditions), loads the lambdas from the map.

I think the steps here are clear enough. On the one hand, 0.3% is not too small, we see 0.5% as significant in middle end. On the other hand, the steps are pretty common in Serializations. I think it will be helpful for you to understand the process.

Sorry in a head that I approve the previous solution but now ask for something more complex. But let's try to do things better if possible.

> 
> > For (2), whether or not it is safe is, can we we load the lambdas without loading the definition of the functions? If not, we can add a code when getting the definition of the functions like:
> > ```
> > if (lambdas not loaded)
> >      getCanonicalDecl()->loadLambdas();
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > Then we can delay the loading of the lambdas to make it more efficient.
> 
> It will be too late in my example. Without my change lambdas get loaded during function body deserialization but it was too late because loading template specialization caused choosing canonical record for the lambda from another modules. Function itself is selected from the last loaded modules by lookup (i.e. it happens in reverse order of module loading) but template specialisations get loaded in direct order. I tried to change order of specilization loading to opposite but in real word it not always works because some modules can be loaded for example in case of `-fmodule-file=<module-name>=<path/to/BMI>`.

So in short, in my question, can I treat your answer as, we can load a lambda or choose a lambda as the canonical decl without deserializing the **corresponding** body of the function decl? And if yes, can we try to explore in which situations? It will be helpful to optimize our strategy.

https://github.com/llvm/llvm-project/pull/109167