[cfe-dev] Patch to add __builtin_shufflevector

Thu Feb 28 20:30:34 PST 2008

On Feb 28, 2008, at 6:52 PM, Eli Friedman wrote:

> Per subject, patch to add __builtin_shufflevector.  This is needed to
> implement mmintrin.h and friends efficiently on top of LLVM, because
> LLVM isn't very smart about vectors, and vectors are hard(tm).  The
> diagnostics aren't quite right, but other than that, does this patch
> look okay?

This is very nice!  I have some minor stuff below, mostly little  
details.

+  ShuffleVectorExpr(Expr **args, unsigned nexpr,
+                    QualType Type, SourceLocation BLoc,
+                    SourceLocation RP) :
+    Expr(ShuffleVectorExprClass, Type), BuiltinLoc(BLoc),
+    RParenLoc(RP), SubExprs(args), NumExprs(nexpr) {}

This needs a comment that says that "args" must be new[]'d, and that  
ownership is transfered to the copy ctor.  Better yet, it would be  
good if the ctor actually did the new[] + memcpy itself, instead of  
having this behavior.

+  ~ShuffleVectorExpr() {
+    delete [] SubExprs;
+  }

This should delete the children ASTs as well as the array.


+  /// getExpr - Return the Expr at the specified index.
+  Expr *getExpr(unsigned Index) {
+    assert((Index < NumExprs) && "Arg access out of range!");
+    return SubExprs[Index];
+  }

This should probably also come in const form.  It would be nice to  
have an "int getShuffleMaskIdx(unsigned N)" method that converts the  
shuffle mask AST to a constant int and returns it, as a helper function.

--- Sema/SemaChecking.cpp	(revision 47702)
+++ Sema/SemaChecking.cpp	(working copy)
@@ -40,23 +40,36 @@
+    if (!CheckBuiltinCFStringArgument(TheCall->getArg(0))) {
+      delete TheCall;
+      return true;
+    }

SemaChecking should never delete the call on an error because  
ActOnCallExpr has an OwningPtr that will delete it if sema is not  
successful.  I think just removing the delete is ok from the error  
paths.  Actually, with this code in SemaExpr:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr- 
 >getDecl()))
-        if (CheckFunctionCall(FDecl, TheCall.get()))
-          return true;
+        return CheckFunctionCall(FDecl, TheCall.take());

the call will always be deleted.  I think this code should be:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr- 
 >getDecl())) {
+        ExprResult R = CheckFunctionCall(FDecl, TheCall.take());
          if (R.error) return;
          TheCall = R.Val;
        }

+/// SemaBuiltinUnorderedCompare - Handle functions like  
__builtin_isgreater and
+/// friends.  This is declared to take (...), so we have to check  
everything.
+Action::ExprResult Sema::SemaBuiltinShuffleVector(CallExpr *TheCall) {

plz update the comment.

+  if (!Context.typesAreCompatible(
+          TheCall->getArg(0)->getType().getUnqualifiedType(),
+          TheCall->getArg(1)->getType().getUnqualifiedType())) {
+    return Diag(TheCall->getLocEnd(),  
diag::err_typecheck_call_too_few_args);
+  }

I think the (canonical) types should be required to be exactly  
identical, not just compatible, what do you think?

+    if (!TheCall->getArg(i)->isIntegerConstantExpr(Result, Context))
+      return Diag(TheCall->getLocEnd(),  
diag::err_typecheck_call_too_few_args);
+    if (Result >= llvm::APSInt(llvm::APInt(32, numElements*2), false))
+      return Diag(TheCall->getLocEnd(),  
diag::err_typecheck_call_too_few_args);

These diags are good, but should be more descriptive.  Do you want to  
support -1 as undef?

I think that "Result >= llvm::APSInt(llvm::APInt(32, numElements*2)"  
will segfault if the index is something like 1LL.  Try something like:

   Result >= llvm::APSInt(llvm::APInt(Result.getBitWidth(),  
numElements*2))

oh except you also have to make the APSInt have the same sign as  
Result (try 1U as an index), and if it's signed, you want to reject  
things like -2.


+  Expr** exprs = new Expr*[TheCall->getNumArgs()];
+
+  for (unsigned i = 0; i < TheCall->getNumArgs(); i++) {
+    exprs[i] = TheCall->getArg(i);
+    TheCall->setArg(i, 0);
+  }
+
+  ShuffleVectorExpr* E = new ShuffleVectorExpr(exprs, numElements+2,  
FAType,
+                                               TheCall->getCallee()- 
 >getLocStart(),
+                                               TheCall- 
 >getRParenLoc());

Instead of doing the new in Sema here, it would be nice if sema used a  
SmallvEctor<32>, and then had the ctor do the new/memcpy just to keep  
the ownership simple and clear.  I think the getLocStart() line is too  
long.

+  for (unsigned i = 2; i < E->getNumSubExprs(); i++) {
+    indices.push_back(cast<llvm::Constant>(CGF.EmitScalarExpr(E- 
 >getExpr(i))));
+  }

It would be nice to use a helper on ShuffleVectorExpr to get the index  
as an unsigned (evaluating as an i-c-e), instead of relying on  
EmitScalarExpr to do it.  Also, it would be nice to handle the undef  
case.


As a random thought, it would be relatively easy to make  
__builtin_shufflevector also support: __builtin_shufflevector(Vec,  
idx1, idx2, idx3, idx4) - a shuffle with a single vector (implicitly  
making the second vector be undef).  I think this would be handy, but  
is jsut syntactic sugar and may not be worth it.  What do you think?

Thanks for working on this!

-Chris