[cfe-dev] Bug in clang parsing pointer to an array in function parameter list

Steve Naroff snaroff at apple.com
Sun Sep 9 17:13:21 PDT 2007


On Sep 9, 2007, at 4:32 PM, Justin Handville wrote:

> Okay.  I found the section in Sema::ParseParamDeclarator where the bug
> is occurring.  Apparently, this is where the fix needs to go.  I'll
> work on analyzing why this code is failing, put together some test
> cases, and submit the patch at the appropriate place.  I apologize,
> I'm still learning the code.
>

No need to apologize. Here is some more data that might help...

 From my perspective, it's not clear the fix can be isolated to  
Sema::ParseParamDeclarator().

First, let's be clear on the bug. The bug is the functions type is  
not in sync with it's ParmVarDecl AST node.

The function type is computed *before* calling  
Sema::ParseParamDeclarator(), which does the ParmVarDecl conversion.  
As a result, we could either...

(a) Move the argument conversion "up" to Sema::GetTypeForDeclarator 
(). If we did this, the conversion code could be removed from  
Sema::ParseParamDeclarator().
(b) Have Sema::ParseParamDeclarator() some how change the functions  
type when it does the conversion for ParmVarDecl (I don't like this,  
put want to mention it for completeness).
(c) See if it is possible for the code generator to rely on the  
ParmVarDecls, *not* the function type. Chris and I thought long and  
hard about this particular ParmVarDecl conversion (hence the large  
comment in the code). That said, this may well be a code gen bug. If  
not, then we will need to consider a front-end fix.

snaroff

Breakpoint 1, clang::Sema::GetTypeForDeclarator (this=0x2605900,  
D=@0xbffff2a0, S=0x2605a50) at SemaType.cpp:248
248           if (!FTI.hasPrototype) {
(gdb) c
Continuing.
int (foo)(int, char *[])

Breakpoint 2, clang::Sema::ParseParamDeclarator (this=0x2605900,  
FTI=@0xbffff2c0, ArgNo=0, FnScope=0x2605b20) at SemaDecl.cpp:688
688       QualType parmDeclType = QualType::getFromOpaquePtr 
(PI.TypeInfo);

> On 9/9/07, Justin Handville <uniheliodem at gmail.com> wrote:
>> Also, note that the issue is with the initial store of the argv
>> argument generated by the code generator, which isn't shown in the
>> parse-ast-dump call.
>>
>> I'm still not familiar with the code base, so I'll need a little
>> guidance with where this change belongs, but the bug definitely
>> exists.
>>
>> On 9/9/07, Justin Handville <uniheliodem at gmail.com> wrote:
>>> Ah.  Then I'll check in the front end.  I thought that the char* 
>>> [] in
>>> the parameter list looked a little funny, but I thought maybe that
>>> might be intended.
>>>
>>> Without the patch, the code asserts, since it is trying to store a
>>> char*[] into a char** location.  It wouldn't be difficult to add the
>>> check to the appropriate place in the front end to change the  
>>> argument
>>> declaration to a char**, as llvm-gcc does.
>>>
>>> On 9/9/07, Steve Naroff <snaroff at apple.com> wrote:
>>>>
>>>> On Sep 9, 2007, at 3:32 PM, Justin Handville wrote:
>>>>
>>>>> With the following patch, I can now compile and execute this  
>>>>> source
>>>>> file:
>>>>>
>>>>> int puts(char*);
>>>>>
>>>>> int main(int argc, char* argv[])
>>>>> {
>>>>>   char* foo = argv[0];
>>>>>
>>>>>   puts(foo);
>>>>>
>>>>>   return 0;
>>>>> }
>>>>>
>>>>
>>>> Justin,
>>>>
>>>> I'm a bit confused. The front-end (Sema::ParseParamDeclarator)
>>>> already converts argv from "char *[]" to "char **".
>>>>
>>>> See the "DeclRefExpr" node below (on line 6).
>>>>
>>>> I don't know why the code generator would need to fiddle with this.
>>>>
>>>> Chris will need to advise...
>>>>
>>>> snaroff
>>>>
>>>> [dylan:~/llvm/tools/clang] admin% ../../Debug/bin/clang justin.c -
>>>> parse-ast-dump
>>>>
>>>> int puts(char *);
>>>>
>>>> int main(int argc, char *argv[])
>>>> (CompoundStmt 0x3105ed0 <justin.c:5:1, line:11:1>
>>>>    (DeclStmt 0x3105bf0 <:0:0>
>>>>      0x3105dd0 "char *foo =
>>>>        (ArraySubscriptExpr 0x3105db0 <justin.c:6:15, col:21>  
>>>> 'char *'
>>>>          (DeclRefExpr 0x3105d70 <col:15> 'char **' Decl='argv'
>>>> 0x3105d40)
>>>>          (IntegerLiteral 0x3105d90 <col:20> 'int' 0))")
>>>>    (CallExpr 0x3105e70 <line:8:3, col:11> 'int'
>>>>      (ImplicitCastExpr 0x3105e60 <col:3> 'int (*)(char *)'
>>>>        (DeclRefExpr 0x3105e00 <col:3> 'int (char *)' Decl='puts'
>>>> 0x3105c20))
>>>>      (DeclRefExpr 0x3105e20 <col:8> 'char *' Decl='foo' 0x3105dd0))
>>>>    (ReturnStmt 0x3105ec0 <line:10:3, col:10>
>>>>      (IntegerLiteral 0x3105ea0 <col:10> 'int' 0)))
>>>>
>>>>
>>>>> Here's the patch.  I'm curious to see if this will work in all  
>>>>> cases.
>>>>> I am still adding test cases to the build, so it is a bit
>>>>> experimental:
>>>>>
>>>>> ndex: CodeGen/CodeGenFunction.h
>>>>> ================================================================== 
>>>>> =
>>>>> --- CodeGen/CodeGenFunction.h   (revision 41790)
>>>>> +++ CodeGen/CodeGenFunction.h   (working copy)
>>>>> @@ -237,6 +237,11 @@
>>>>>    };
>>>>>    llvm::SmallVector<BreakContinue, 8> BreakContinueStack;
>>>>>
>>>>> +  static bool
>>>>> +  isCompatibleDeclPointerType(
>>>>> +    const llvm::Type* Arg,
>>>>> +    const llvm::Type* DeclPtr);
>>>>> +
>>>>>  public:
>>>>>    CodeGenFunction(CodeGenModule &cgm);
>>>>>
>>>>> Index: CodeGen/CGDecl.cpp
>>>>> ================================================================== 
>>>>> =
>>>>> --- CodeGen/CGDecl.cpp  (revision 41790)
>>>>> +++ CodeGen/CGDecl.cpp  (working copy)
>>>>> @@ -96,11 +96,69 @@
>>>>>    }
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +CodeGenFunction::isCompatibleDeclPointerType(
>>>>> +  const llvm::Type* ArgType,
>>>>> +  const llvm::Type* DeclType)
>>>>> +{
>>>>> +  if(
>>>>> +      ArgType->getTypeID() != llvm::Type::PointerTyID
>>>>> +   || DeclType->getTypeID() != llvm::Type::PointerTyID)
>>>>> +  {
>>>>> +    return false;
>>>>> +  }
>>>>> +
>>>>> +  while(
>>>>> +      ArgType->getNumContainedTypes() == 1
>>>>> +   && DeclType->getNumContainedTypes() == 1)
>>>>> +  {
>>>>> +    const llvm::Type* ArgContainedType = ArgType- 
>>>>> >getContainedType
>>>>> (0);
>>>>> +    const llvm::Type* DeclContainedType = DeclType-
>>>>>> getContainedType(0);
>>>>> +
>>>>> +    //if the two contained types point to the same
>>>>> +    //location, then these pointer types are compatible
>>>>> +    if(
>>>>> +      ArgContainedType == DeclContainedType)
>>>>> +    {
>>>>> +      return true;
>>>>> +    }
>>>>> +    //is the pointer indirection level different?
>>>>> +    else if(
>>>>> +      ArgContainedType->getNumContainedTypes() !=
>>>>> +        DeclContainedType->getNumContainedTypes())
>>>>> +    {
>>>>> +      break;
>>>>> +    }
>>>>> +    //are these compatible pointer / array types?
>>>>> +    else if(
>>>>> +        llvm::SequentialType::classof(ArgContainedType)
>>>>> +     && llvm::SequentialType::classof(DeclContainedType))
>>>>> +    {
>>>>> +      if( ArgContainedType->getNumContainedTypes() != 1 )
>>>>> +      {
>>>>> +        //FIXME will this ever happen with arrays / pointers?
>>>>> +        break;
>>>>> +      }
>>>>> +
>>>>> +      ArgType = ArgContainedType;
>>>>> +      DeclType = DeclContainedType;
>>>>> +    }
>>>>> +    else
>>>>> +    {
>>>>> +      //one or more of these is something other than an array /
>>>>> pointer
>>>>> +      break;
>>>>> +    }
>>>>> +  }
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>>  /// Emit an alloca for the specified parameter and set up
>>>>> LocalDeclMap.
>>>>>  void CodeGenFunction::EmitParmDecl(const ParmVarDecl &D,
>>>>> llvm::Value *Arg) {
>>>>>    QualType Ty = D.getCanonicalType();
>>>>>
>>>>>    llvm::Value *DeclPtr;
>>>>> +
>>>>>    if (!Ty->isConstantSizeType(getContext())) {
>>>>>      // Variable sized values always are passed by-reference.
>>>>>      DeclPtr = Arg;
>>>>> @@ -111,7 +169,19 @@
>>>>>        // TODO: Alignment
>>>>>        DeclPtr = new llvm::AllocaInst(LTy, 0, std::string 
>>>>> (D.getName
>>>>> ())+".addr",
>>>>>                                       AllocaInsertPt);
>>>>> -
>>>>> +
>>>>> +      //do we need to perform an implicit conversion?
>>>>> +      if( isCompatibleDeclPointerType(Arg->getType(), DeclPtr-
>>>>>> getType()) )
>>>>> +      {
>>>>> +        llvm::Value* CastPtr =
>>>>> +          Builder.CreateBitCast(
>>>>> +            Arg,
>>>>> +            LTy,
>>>>> +            (std::string(D.getName()) + ".implconv").c_str());
>>>>> +
>>>>> +        Arg = CastPtr;
>>>>> +      }
>>>>> +
>>>>>        // Store the initial value into the alloca.
>>>>>        Builder.CreateStore(Arg, DeclPtr);
>>>>>      } else {
>>>>>
>>>>> On 9/9/07, Justin Handville <uniheliodem at gmail.com> wrote:
>>>>>> Perhaps the better question to ask is where should this transform
>>>>>> occur?
>>>>>>
>>>>>> On 9/9/07, Justin Handville <uniheliodem at gmail.com> wrote:
>>>>>>> The following code snippet generates an assert when compiled  
>>>>>>> with
>>>>>>> clang -emit-llvm
>>>>>>>
>>>>>>> int main(int argc, char* argv[])
>>>>>>> {
>>>>>>>   return 0;
>>>>>>> }
>>>>>>>
>>>>>>> Assert:
>>>>>>>
>>>>>>> Assertion failed: (getOperand(0)->getType() ==
>>>>>>> cast<PointerType>(getOperand(1)->getType())->getElementType() &&
>>>>>>> "Ptr
>>>>>>> must be a pointer to Val type!"), function AssertOK, file
>>>>>>> Instructions.cpp, line 796.
>>>>>>>
>>>>>>> The problem here is at line 116 of CGDecl.cpp:
>>>>>>>
>>>>>>>     if (LTy->isFirstClassType()) {
>>>>>>>       // TODO: Alignment
>>>>>>>       DeclPtr = new llvm::AllocaInst(LTy, 0, std::string 
>>>>>>> (D.getName
>>>>>>> ())+".addr",
>>>>>>>                                      AllocaInsertPt);
>>>>>>>
>>>>>>>       // Store the initial value into the alloca.
>>>>>>>       Builder.CreateStore(Arg, DeclPtr);  //attempting to store
>>>>>>> char*[] as char**
>>>>>>>
>>>>>>> The problem is that llvm treats char*[] and char** as two  
>>>>>>> distinct
>>>>>>> types as far as I can tell.  EmitParmDecl should convert the  
>>>>>>> type
>>>>>>> to a
>>>>>>> char** before calling CreateStore, since this is how the  
>>>>>>> argument
>>>>>>> would be passed in C ABI.
>>>>>>>
>>>>>>> I assume that a transform step should be added before the
>>>>>>> CreateStore
>>>>>>> that could normalize the argument to match the declptr.  I  
>>>>>>> assume a
>>>>>>> good approach would be to descend through both Arg and DeclPtr,
>>>>>>> transforming Arg to match DeclPtr where compatible, or  
>>>>>>> raising an
>>>>>>> appropriate diagnostic error if such a transformation is not
>>>>>>> possible.
>>>>>>>
>>>>>>> If I can get some approval that this route would work in this  
>>>>>>> case,
>>>>>>> then I will go ahead and submit the patch.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Justin
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>
>>




More information about the cfe-dev mailing list