[PATCH] [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction

Duncan P. N. Exon Smith dexonsmith at apple.com
Tue Feb 24 19:43:43 PST 2015


> On 2015-Feb-24, at 15:59, David Blaikie <dblaikie at gmail.com> wrote:
> 
> ping
> 
> On Fri, Feb 13, 2015 at 5:32 PM, David Blaikie <dblaikie at gmail.com> wrote:
> Hi rafael, resistor, grosbach, chandlerc, grosser,
> 
> One of several parallel first steps to remove the target type of pointers,
> replacing them with a single opaque pointer type.
> 
> This adds an explicit type parameter to the gep instruction so that when the
> first parameter becomes an opaque pointer type, the type to gep through is
> still available to the instructions.
> 
> * This doesn't modify gep operators, only instructions (operators will be
>   handled separately)
> 
> * Textual IR changes only. Bitcode (including upgrade) and changing the
>   in-memory representation will be in separate changes.
> 
> * geps of vectors are transformed as:
>     getelementptr <4 x float*> %x, ...
>   ->getelementptr float, <4 x float*> %x, ...
>   Then, once the opaque pointer type is introduced, this will ultimately look
>   like:
>     getelementptr float, <4 x ptr> %x
>   with the unambiguous interpretation that it is a vector of pointers to float.
> 
> * address spaces remain on the pointer, not the type:
>     getelementptr float addrspace(1)* %x
>   ->getelementptr float, float addrspace(1)* %x
>   Then, eventually:
>     getelementptr float, ptr addrspace(1) %x
> 
> Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files.
> 
> update.py:
> import fileinput
> import sys
> import re
> 
> ib = re.compile("getelementptr inbounds +[^( ]")
> norm = re.compile("getelementptr +[^( ]")
> 
> ibrep = re.compile(r"(^.*?getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
> normrep = re.compile(       r"(^.*?getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
> 
> def conv(match, line):
>   if not match:
>     return line
>   line = match.groups()[0]
>   if len(match.groups()[5]) == 0:
>     line += match.groups()[2]
>   line += match.groups()[3]
>   line += ", "
>   line += match.groups()[1]
>   line += "\n"
>   return line
> 
> for line in sys.stdin:
>   if line.find("getelementptr ") == line.find("getelementptr inbounds"):
>     if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
>       line = conv(re.match(ibrep, line), line)
>   elif line.find("getelementptr ") != line.find("getelementptr ("):
>     line = conv(re.match(normrep, line), line)
>   sys.stdout.write(line)
> 
> apply.sh:
> #!/bin/bash
> for name in "$@"
> do
>   python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
>   rm -f "$name.tmp"
> done
> 
> The actual commands:
> From llvm/src:
> find test/ -name *.ll | xargs ./apply.sh
> From llvm/src/tools/clang:
> find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
> 
> After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt
> all checked out - I haven't looked at polly yet). I haven't included the actual
> test changes in this review to make it easier to look at - but they are /only/
> the changes created by this script (except the unit test change, which I have
> included) - nothing manual.
> 
> The extra 'rm' in the apply.sh script is due to a few files in clang's test
> suite using interesting unicode stuff that my python script was throwing
> exceptions on. None of those files needed to be migrated, so it seemed
> sufficient to ignore those cases.
> 
> http://reviews.llvm.org/D7636
> 
> Files:
>   include/llvm/IR/Instructions.h
>   lib/AsmParser/LLParser.cpp
>   lib/IR/AsmWriter.cpp
>   unittests/IR/ConstantsTest.cpp
> 
> Index: include/llvm/IR/Instructions.h
> ===================================================================
> --- include/llvm/IR/Instructions.h
> +++ include/llvm/IR/Instructions.h
> @@ -845,6 +845,13 @@
>      return cast<SequentialType>(Instruction::getType());
>    }
> 
> +  Type *getSourceElementType() const {
> +    SequentialType *Ty = cast<SequentialType>(getPointerOperandType());
> +    if (VectorType *VTy = dyn_cast<VectorType>(Ty))
> +      Ty = cast<SequentialType>(VTy->getElementType());
> +    return Ty->getElementType();
> +  }
> +
>    /// \brief Returns the address space of this instruction's pointer type.
>    unsigned getAddressSpace() const {
>      // Note that this is always the same as the pointer operand's address space
> Index: lib/AsmParser/LLParser.cpp
> ===================================================================
> --- lib/AsmParser/LLParser.cpp
> +++ lib/AsmParser/LLParser.cpp
> @@ -5364,8 +5364,18 @@
> 
>    bool InBounds = EatIfPresent(lltok::kw_inbounds);
> 
> +  Type *Ty = nullptr;
> +  if (ParseType(Ty)) return true;
> +  ParseToken(lltok::comma, "expected comma after getelementptr's type");
> +

You neglected to check the return of `ParseToken()` here.  Like that
patch, I think you should have a testcase for this, since this is how
we're rejecting the old syntax.

>    if (ParseTypeAndValue(Ptr, Loc, PFS)) return true;
> 
> +
> +  Type *PtrTy = Ptr->getType();
> +  if (VectorType *VT = dyn_cast<VectorType>(PtrTy))
> +    PtrTy = VT->getElementType();
> +  assert(Ty == cast<SequentialType>(PtrTy)->getElementType());
> +

This should be an `Error()`, and I think there should be a testcase
for the error (a type mismatch seems easy enough to hit, so the
diagnostic seems important to me).

>    Type *BaseType = Ptr->getType();
>    PointerType *BasePointerType = dyn_cast<PointerType>(BaseType->getScalarType());
>    if (!BasePointerType)
> Index: lib/IR/AsmWriter.cpp
> ===================================================================
> --- lib/IR/AsmWriter.cpp
> +++ lib/IR/AsmWriter.cpp
> @@ -2880,6 +2880,11 @@
>      Out << ", ";
>      TypePrinter.print(I.getType(), Out);
>    } else if (Operand) {   // Print the normal way.
> +    if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&I)) {
> +      Out << ' ';
> +      TypePrinter.print(GEP->getSourceElementType(), Out);
> +      Out << ',';
> +    }
> 
>      // PrintAllTypes - Instructions who have operands of all the same type
>      // omit the type from all but the first operand.  If the instruction has
> Index: unittests/IR/ConstantsTest.cpp
> ===================================================================
> --- unittests/IR/ConstantsTest.cpp
> +++ unittests/IR/ConstantsTest.cpp
> @@ -246,9 +246,9 @@
>    // FIXME: getGetElementPtr() actually creates an inbounds ConstantGEP,
>    //        not a normal one!
>    //CHECK(ConstantExpr::getGetElementPtr(Global, V, false),
> -  //      "getelementptr i32** @dummy, i32 1");
> +  //      "getelementptr i32*, i32** @dummy, i32 1");
>    CHECK(ConstantExpr::getInBoundsGetElementPtr(Global, V),
> -        "getelementptr inbounds i32** @dummy, i32 1");
> +        "getelementptr inbounds i32*, i32** @dummy, i32 1");
> 
>    CHECK(ConstantExpr::getExtractElement(P6, One), "extractelement <2 x i16> "
>          P6STR ", i32 1");
> 
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list