[cfe-commits] r68732 - in /cfe/trunk: include/clang/AST/ include/clang/Frontend/ lib/AST/ lib/Frontend/ lib/Sema/ test/ test/PCH/ tools/clang-cc/

Douglas Gregor dgregor at apple.com
Fri Apr 10 10:22:57 PDT 2009


On Apr 9, 2009, at 10:52 PM, Chris Lattner wrote:

>> URL: http://llvm.org/viewvc/llvm-project?rev=68732&view=rev
>> Log:
>> Implementation of pre-compiled headers (PCH) based on lazy
>> de-serialization of abstract syntax trees.
>
> Awesome Doug! Thank you for splitting out the astcontext-everwhere  
> part of the patch, it definitely makes it easier to review.
>
>> +++ cfe/trunk/include/clang/AST/DeclContextInternals.h Thu Apr  9  
>> 17:27:44 2009
>>  /// VectorTy - When in vector form, this is what the Data pointer  
>> points to.
>> +  typedef llvm::SmallVector<uintptr_t, 4> VectorTy;
>> +
>> +  /// \brief The stored data, which will be either a declaration  
>> ID, a
>> +  /// pointer to a NamedDecl, or a pointer to a vector.
>> +  uintptr_t Data;
>
> Why the custom bitmangling?

It's an unsigned, a NamedDecl*, or a VectorTy*... IIRC, there's no  
variant type for that, and of course the VectorTy* can be interpreted  
in two different ways (vector-of-IDs or vector-of-NamedDecl*s).

>> +  void setFromDeclIDs(const llvm::SmallVectorImpl<unsigned> &Vec) {
>> +    if (Vec.size() > 1) {
>> +      VectorTy *Vector = getAsVector();
>> +      if (!Vector) {
>> +        Vector = new VectorTy;
>> +        Data = reinterpret_cast<uintptr_t>(Vector) | DK_Decl_Vector;
>> +      }
>> +
>> +      Vector->resize(Vec.size());
>> +      std::copy(Vec.begin(), Vec.end(), Vector->begin());
>> +      return;
>> +    }
>> +
>> +    if (VectorTy *Vector = getAsVector())
>> +      delete Vector;
>> +
>> +    if (Vec.size() == 0)
>
> Vec.empty() ?

Oh, yeah. Duh.

>> +++ cfe/trunk/include/clang/Frontend/PCHBitCodes.h Thu Apr  9  
>> 17:27:44 2009
>> @@ -0,0 +1,239 @@
>
>>
>> +#ifndef LLVM_CLANG_FRONTEND_PCHBITCODES_H
>> +#define LLVM_CLANG_FRONTEND_PCHBITCODES_H
>> +
>> +#include "llvm/Bitcode/BitCodes.h"
>> +#include "llvm/Support/DataTypes.h"
>> +
>> +namespace clang {
>> +  namespace pch {
>> +    const int IDBits = 32;
>
> IDBits is dead?

Yup, thanks.


>> +++ cfe/trunk/include/clang/Frontend/PCHWriter.h Thu Apr  9  
>> 17:27:44 2009
>>
>> +  /// The ID numbers of declarations are consecutive (in order of
>> +  /// discovery) and start at 2. 1 is reserved for the translation
>> +  /// unit, while 0 is reserved for NULL.
>
> Please repeat or move this comment to the definition of the pch::ID  
> typedef.
>
>> +  /// \brief Map that provides the ID numbers of each type within  
>> the
>> +  /// output stream.
>> +  ///
>> +  /// The ID numbers of types are consecutive (in order of  
>> discovery)
>> +  /// and start at 1. 0 is reserved for NULL. When types are  
>> actually
>> +  /// stored in the stream, the ID number is shifted by 3 bits to
>> +  /// allow for the const/volatile/restrict qualifiers.
>
> Okay, maybe not.  Would it make sense to have two different typedefs  
> for TypeID vs DeclID?  If that happened, you could have an  
> appropriate comment for each.

This is better, thanks!

>> +++ cfe/trunk/lib/Frontend/PCHReader.cpp Thu Apr  9 17:27:44 2009
>
>> +void PCHDeclReader::VisitVarDecl(VarDecl *VD) {
>> +  VisitValueDecl(VD);
>> +  VD->setStorageClass((VarDecl::StorageClass)Record[Idx++]);
>> +  VD->setThreadSpecified(Record[Idx++]);
>> +  VD->setCXXDirectInitializer(Record[Idx++]);
>> +  VD->setDeclaredInCondition(Record[Idx++]);
>
> This is a very cute idiom.  One potential problem to think about is  
> that a malformed pch file could crash the reader by having a record  
> with too-few fields: this would cause invalid indexes of Record to  
> be used.

Good point. As part of turning the lame Error()s into real  
diagnostics, I'll pre-test the record size and give a FATAL_ERROR if  
there is a problem.

>>
>> +/// \brief Read the type-offsets block.
>> +bool PCHReader::ReadTypeOffsets() {
>
> Would it make sense to just make TYPE_OFFSETS be a record in the top- 
> level block instead of a block that only contains one record type?   
> Likewise with DECL_OFFSETS.  That would get rid of the boilerplate  
> to read a block.

Yep, that makes sense.

>> +/// \brief Read and return the type at the given offset.
>> +///
>> +/// This routine actually reads the record corresponding to the type
>> +/// at the given offset in the bitstream. It is a helper routine for
>> +/// GetType, which deals with reading type IDs.
>> +QualType PCHReader::ReadTypeRecord(uint64_t Offset) {
>> +  Stream.JumpToBit(Offset);
>> +  RecordData Record;
>> +  unsigned Code = Stream.ReadCode();
>> +  switch ((pch::TypeCode)Stream.ReadRecord(Code, Record)) {
>> +  case pch::TYPE_FIXED_WIDTH_INT: {
>> +    assert(Record.size() == 2 && "Incorrect encoding of fixed- 
>> width int type");
>
> Instead of asserts, should these emit an error somehow and return  
> something bogus?  Having an invalid PCH file crash the compiler is  
> acceptable, but suboptimal.

Yeah, these should be FATAL_ERRORS some some recovery strategy.

>> +    // FIXME: Several other kinds of types to deserialize here!
>> +  default:
>> +    assert("Unable to deserialize this type");
>
> 0 && "unable...

Oops, thanks.

>> +void PCHDeclWriter::VisitDecl(Decl *D) {
>> +  Writer.AddDeclRef(cast_or_null<Decl>(D->getDeclContext()),  
>> Record);
>> +  Writer.AddDeclRef(cast_or_null<Decl>(D->getLexicalDeclContext 
>> ()), Record);
>> +  Writer.AddSourceLocation(D->getLocation(), Record);
>> +  Record.push_back(D->isInvalidDecl());
>> +  // FIXME: hasAttrs
>
> Interesting question: would it be better to unique attribute lists  
> and output an attr ID # with each decl, or would it be better to  
> emit a "has attr" bit and then emit a separate declid->attrlist  
> datastructure independent of this?

I think I'd like to output a "has attr" bit, and then have a (not-yet- 
implemented) DECL_ATTRIBUTE record directly after the decl record. So,  
when the reader sees that the decl has an attribute, it builds the  
decl and then reads the next record to get its attributes.

I don't know if it's worth uniquing attributes or not.

>> +/// \brief Write a block containing all of the types.
>> +void PCHWriter::WriteTypesBlock(ASTContext &Context) {
>> +  // Enter the types block
>> +  S.EnterSubblock(pch::TYPES_BLOCK_ID, 2);
>> +
>> +  // Emit all of the types in the ASTContext
>> +  for (std::vector<Type*>::const_iterator T = Context.getTypes 
>> ().begin(),
>> +                                       TEnd = Context.getTypes 
>> ().end();
>> +       T != TEnd; ++T) {
>> +    // Builtin types are never serialized.
>> +    if (isa<BuiltinType>(*T))
>> +      continue;
>> +
>> +    WriteType(*T);
>> +  }
>
> Why not just emit all types referred to by decls?  Is there a  
> difference?

We need to start somewhere, and once we've emitted the block for decls  
or types, we can't go back and add to that block when the processing  
of the other block uncovers a new decl/type. There are circular  
dependencies here: types store decls and decls store types.

By iterating over the list of types, and writing the types block  
first, we know that we've handled *all* of the types. I guess we  
should assert that the number of types hasn't changed once we're done  
writing the PCH file... (lazily de-serializing one PCH file to write  
another? hrm)

Now, we could just iterate until we're done, emitting multiple decl  
and type blocks as needed. It would be slightly unfortunate to have to  
do so, because it could make it a little tougher to cope with  
abbreviations when reading the PCH file. However, we might end up  
writing fewer types (especially in C++ with template instantiation,  
where lots of types will be built and then thrown away).

>> +/// \brief Write the block containing all of the declaration IDs
>> +/// lexically declared within the given DeclContext.
>> +///
>> +/// \returns the offset of the DECL_CONTEXT_LEXICAL block within the
>> +/// bistream, or 0 if no block was written.
>> +uint64_t PCHWriter::WriteDeclContextLexicalBlock(ASTContext  
>> &Context,
>> +                                                 DeclContext *DC) {
>> +  if (DC->decls_begin(Context) == DC->decls_end(Context))
>> +    return 0;
>
> How about a new DC->decls_empty(Context) method?

Sure.

>> +++ cfe/trunk/lib/Sema/SemaDecl.cpp Thu Apr  9 17:27:44 2009
>> @@ -524,6 +524,12 @@
>>      return false;
>>  }
>>
>> +  // __builtin_va_list gets redeclared in the built-in definitions
>> +  // buffer when using PCH. Don't complain about such redefinitions.
>> +  if (Context.getExternalSource() &&
>> +      strcmp(SourceMgr.getBufferName(New->getLocation()), "<built- 
>> in>") == 0)
>> +    return false;
>
> Eww.  We should really just fix sema to not do this.  Can you add a  
> verbose fixme explaining the issue and the fix we discussed?

Done.

> Overall, this is great Doug!

Thanks for the review!

	- Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20090410/c0ae2d72/attachment.html>


More information about the cfe-commits mailing list