<div dir="ltr">I have committed r262091 to fix this issue. Thanks again for reporting this!<div><br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature">thanks,<br>Cong</div></div>

<br><div class="gmail_quote">On Thu, Feb 25, 2016 at 7:17 AM, Kristof Beyls <span dir="ltr"><<a href="mailto:kristof.beyls@arm.com" target="_blank">kristof.beyls@arm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Cong,<br>

<br>

It seems this commit introduced a regression found by our AArch64 and AArch32 Neon intrinsics testers, as it triggers the following assert:<br>

lib/IR/Instructions.cpp:1791: static int llvm::ShuffleVectorInst::getMaskValue(llvm::Constant *, unsigned int): Assertion `i < Mask->getType()->getVectorNumElements() && "Index out of range"' failed.<br>

<br>

I've attached the smallest reproducer I managed to create.<br>

You should be able to reproduce this assertion failure with the following command: clang -target aarch64 -c -O2 261804_regression_formatted.c<br>

<br>

Could you have a look at this?<br>

<br>

Thanks!<span class="HOEnZb"><font color="#888888"><br>

<br>

Kristof</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

<br>

On 25/02/2016 00:40, Cong Hou via llvm-commits wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Author: conghou<br>

Date: Wed Feb 24 17:40:36 2016<br>

New Revision: 261804<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=261804&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=261804&view=rev</a><br>

Log:<br>

Detecte vector reduction operations just before instruction selection.<br>

<br>

(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).<br>

<br>

This patch detects vector reductions before instruction selection. Vector<br>

reductions are vectorized reduction operations, and for such operations we have<br>

freedom to reorganize the elements of the result as long as the reduction of them<br>

stay unchanged. This will enable some reduction pattern recognition during<br>

instruction combine such as SAD/dot-product on X86. A flag is added to<br>

SDNodeFlags to mark those vector reduction nodes to be checked during instruction<br>

combine.<br>

<br>

To detect those vector reductions, we search def-use chains starting from the<br>

given instruction, and check if all uses fall into two categories:<br>

<br>

1. Reduction with another vector.<br>

2. Reduction on all elements.<br>

<br>

in which 2 is detected by recognizing the pattern that the loop vectorizer<br>

generates to reduce all elements in the vector outside of the loop, which<br>

includes several ShuffleVector and one ExtractElement instructions.<br>

<br>

<br>

Differential revision: <a href="http://reviews.llvm.org/D15250" rel="noreferrer" target="_blank">http://reviews.llvm.org/D15250</a><br>

<br>

<br>

<br>

Added:<br>

     llvm/trunk/test/CodeGen/Generic/pr26652.ll<br>

     llvm/trunk/test/CodeGen/Generic/vector-redux.ll<br>

Modified:<br>

     llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h<br>

     llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp<br>

<br>

Modified: llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h?rev=261804&r1=261803&r2=261804&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h?rev=261804&r1=261803&r2=261804&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h (original)<br>

+++ llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h Wed Feb 24 17:40:36 2016<br>

@@ -328,6 +328,7 @@ private:<br>

    bool NoInfs : 1;<br>

    bool NoSignedZeros : 1;<br>

    bool AllowReciprocal : 1;<br>

+  bool VectorReduction : 1;<br>

    public:<br>

    /// Default constructor turns off all optimization flags.<br>

@@ -340,6 +341,7 @@ public:<br>

      NoInfs = false;<br>

      NoSignedZeros = false;<br>

      AllowReciprocal = false;<br>

+    VectorReduction = false;<br>

    }<br>

      // These are mutators for each flag.<br>

@@ -351,6 +353,7 @@ public:<br>

    void setNoInfs(bool b) { NoInfs = b; }<br>

    void setNoSignedZeros(bool b) { NoSignedZeros = b; }<br>

    void setAllowReciprocal(bool b) { AllowReciprocal = b; }<br>

+  void setVectorReduction(bool b) { VectorReduction = b; }<br>

      // These are accessors for each flag.<br>

    bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }<br>

@@ -361,6 +364,7 @@ public:<br>

    bool hasNoInfs() const { return NoInfs; }<br>

    bool hasNoSignedZeros() const { return NoSignedZeros; }<br>

    bool hasAllowReciprocal() const { return AllowReciprocal; }<br>

+  bool hasVectorReduction() const { return VectorReduction; }<br>

      /// Return a raw encoding of the flags.<br>

    /// This function should only be used to add data to the NodeID value.<br>

<br>

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp?rev=261804&r1=261803&r2=261804&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp?rev=261804&r1=261803&r2=261804&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (original)<br>

+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Wed Feb 24 17:40:36 2016<br>

@@ -2317,6 +2317,129 @@ void SelectionDAGBuilder::visitFSub(cons<br>

    visitBinary(I, ISD::FSUB);<br>

  }<br>

  +/// Checks if the given instruction performs a vector reduction, in which case<br>

+/// we have the freedom to alter the elements in the result as long as the<br>

+/// reduction of them stays unchanged.<br>

+static bool isVectorReductionOp(const User *I) {<br>

+  const Instruction *Inst = dyn_cast<Instruction>(I);<br>

+  if (!Inst || !Inst->getType()->isVectorTy())<br>

+    return false;<br>

+<br>

+  auto OpCode = Inst->getOpcode();<br>

+  switch (OpCode) {<br>

+  case Instruction::Add:<br>

+  case Instruction::Mul:<br>

+  case Instruction::And:<br>

+  case Instruction::Or:<br>

+  case Instruction::Xor:<br>

+    break;<br>

+  case Instruction::FAdd:<br>

+  case Instruction::FMul:<br>

+    if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))<br>

+      if (FPOp->getFastMathFlags().unsafeAlgebra())<br>

+        break;<br>

+    // Fall through.<br>

+  default:<br>

+    return false;<br>

+  }<br>

+<br>

+  unsigned ElemNum = Inst->getType()->getVectorNumElements();<br>

+  unsigned ElemNumToReduce = ElemNum;<br>

+<br>

+  // Do DFS search on the def-use chain from the given instruction. We only<br>

+  // allow four kinds of operations during the search until we reach the<br>

+  // instruction that extracts the first element from the vector:<br>

+  //<br>

+  //   1. The reduction operation of the same opcode as the given instruction.<br>

+  //<br>

+  //   2. PHI node.<br>

+  //<br>

+  //   3. ShuffleVector instruction together with a reduction operation that<br>

+  //      does a partial reduction.<br>

+  //<br>

+  //   4. ExtractElement that extracts the first element from the vector, and we<br>

+  //      stop searching the def-use chain here.<br>

+  //<br>

+  // 3 & 4 above perform a reduction on all elements of the vector. We push defs<br>

+  // from 1-3 to the stack to continue the DFS. The given instruction is not<br>

+  // a reduction operation if we meet any other instructions other than those<br>

+  // listed above.<br>

+<br>

+  SmallVector<const User *, 16> UsersToVisit{Inst};<br>

+  SmallPtrSet<const User *, 16> Visited;<br>

+  bool ReduxExtracted = false;<br>

+<br>

+  while (!UsersToVisit.empty()) {<br>

+    auto User = UsersToVisit.back();<br>

+    UsersToVisit.pop_back();<br>

+    if (!Visited.insert(User).second)<br>

+      continue;<br>

+<br>

+    for (const auto &U : User->users()) {<br>

+      auto Inst = dyn_cast<Instruction>(U);<br>

+      if (!Inst)<br>

+        return false;<br>

+<br>

+      if (Inst->getOpcode() == OpCode || isa<PHINode>(U)) {<br>

+        if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))<br>

+          if (!isa<PHINode>(FPOp) && !FPOp->getFastMathFlags().unsafeAlgebra())<br>

+            return false;<br>

+        UsersToVisit.push_back(U);<br>

+      } else if (const ShuffleVectorInst *ShufInst =<br>

+                     dyn_cast<ShuffleVectorInst>(U)) {<br>

+        // Detect the following pattern: A ShuffleVector instruction together<br>

+        // with a reduction that do partial reduction on the first and second<br>

+        // ElemNumToReduce / 2 elements, and store the result in<br>

+        // ElemNumToReduce / 2 elements in another vector.<br>

+<br>

+        unsigned ResultElements = ShufInst->getType()->getVectorNumElements();<br>

+        ElemNumToReduce = ResultElements <= ElemNumToReduce ? ResultElements<br>

+                                                            : ElemNumToReduce;<br>

+        if (ElemNumToReduce == 1)<br>

+          return false;<br>

+        if (!isa<UndefValue>(U->getOperand(1)))<br>

+          return false;<br>

+        for (unsigned i = 0; i < ElemNumToReduce / 2; ++i)<br>

+          if (ShufInst->getMaskValue(i) != int(i + ElemNumToReduce / 2))<br>

+            return false;<br>

+        for (unsigned i = ElemNumToReduce / 2; i < ElemNum; ++i)<br>

+          if (ShufInst->getMaskValue(i) != -1)<br>

+            return false;<br>

+<br>

+        // There is only one user of this ShuffleVector instruction, which<br>

+        // must<br>

+        // be a reduction operation.<br>

+        if (!U->hasOneUse())<br>

+          return false;<br>

+<br>

+        auto U2 = dyn_cast<Instruction>(*U->user_begin());<br>

+        if (!U2 || U2->getOpcode() != OpCode)<br>

+          return false;<br>

+<br>

+        // Check operands of the reduction operation.<br>

+        if ((U2->getOperand(0) == U->getOperand(0) && U2->getOperand(1) == U) ||<br>

+            (U2->getOperand(1) == U->getOperand(0) && U2->getOperand(0) == U)) {<br>

+          UsersToVisit.push_back(U2);<br>

+          ElemNumToReduce /= 2;<br>

+        } else<br>

+          return false;<br>

+      } else if (isa<ExtractElementInst>(U)) {<br>

+        // At this moment we should have reduced all elements in the vector.<br>

+        if (ElemNumToReduce != 1)<br>

+          return false;<br>

+<br>

+        const ConstantInt *Val = dyn_cast<ConstantInt>(U->getOperand(1));<br>

+        if (!Val || Val->getZExtValue() != 0)<br>

+          return false;<br>

+<br>

+        ReduxExtracted = true;<br>

+      } else<br>

+        return false;<br>

+    }<br>

+  }<br>

+  return ReduxExtracted;<br>

+}<br>

+<br>

  void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {<br>

    SDValue Op1 = getValue(I.getOperand(0));<br>

    SDValue Op2 = getValue(I.getOperand(1));<br>

@@ -2324,6 +2447,7 @@ void SelectionDAGBuilder::visitBinary(co<br>

    bool nuw = false;<br>

    bool nsw = false;<br>

    bool exact = false;<br>

+  bool vec_redux = false;<br>

    FastMathFlags FMF;<br>

      if (const OverflowingBinaryOperator *OFBinOp =<br>

@@ -2337,10 +2461,16 @@ void SelectionDAGBuilder::visitBinary(co<br>

    if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))<br>

      FMF = FPOp->getFastMathFlags();<br>

  +  if (isVectorReductionOp(&I)) {<br>

+    vec_redux = true;<br>

+    DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");<br>

+  }<br>

+<br>

    SDNodeFlags Flags;<br>

    Flags.setExact(exact);<br>

    Flags.setNoSignedWrap(nsw);<br>

    Flags.setNoUnsignedWrap(nuw);<br>

+  Flags.setVectorReduction(vec_redux);<br>

    if (EnableFMFInDAG) {<br>

      Flags.setAllowReciprocal(FMF.allowReciprocal());<br>

      Flags.setNoInfs(FMF.noInfs());<br>

<br>

Added: llvm/trunk/test/CodeGen/Generic/pr26652.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Generic/pr26652.ll?rev=261804&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Generic/pr26652.ll?rev=261804&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/test/CodeGen/Generic/pr26652.ll (added)<br>

+++ llvm/trunk/test/CodeGen/Generic/pr26652.ll Wed Feb 24 17:40:36 2016<br>

@@ -0,0 +1,8 @@<br>

+; RUN: llc < %s<br>

+<br>

+define <2 x i32> @test(<4 x i32> %a, <4 x i32> %b) {<br>

+entry:<br>

+  %0 = or <4 x i32> %a, %b<br>

+  %1 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 2, i32 3><br>

+  ret <2 x i32> %1<br>

+}<br>

<br>

Added: llvm/trunk/test/CodeGen/Generic/vector-redux.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Generic/vector-redux.ll?rev=261804&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Generic/vector-redux.ll?rev=261804&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/test/CodeGen/Generic/vector-redux.ll (added)<br>

+++ llvm/trunk/test/CodeGen/Generic/vector-redux.ll Wed Feb 24 17:40:36 2016<br>

@@ -0,0 +1,237 @@<br>

+; RUN: llc < %s -debug-only=isel -o /dev/null 2>&1 | FileCheck %s<br>

+; REQUIRES: asserts<br>

+<br>

+@a = global [1024 x i32] zeroinitializer, align 16<br>

+<br>

+define i32 @reduce_add() {<br>

+; CHECK-LABEL: reduce_add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+; CHECK:       Detected a reduction operation: {{.*}} add<br>

+<br>

+min.iters.checked:<br>

+  br label %vector.body<br>

+<br>

+vector.body:<br>

+  %index = phi i64 [ 0, %min.iters.checked ], [ %index.next.4, %vector.body ]<br>

+  %vec.phi = phi <4 x i32> [ zeroinitializer, %min.iters.checked ], [ %28, %vector.body ]<br>

+  %vec.phi4 = phi <4 x i32> [ zeroinitializer, %min.iters.checked ], [ %29, %vector.body ]<br>

+  %0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index<br>

+  %1 = bitcast i32* %0 to <4 x i32>*<br>

+  %wide.load = load <4 x i32>, <4 x i32>* %1, align 16<br>

+  %2 = getelementptr i32, i32* %0, i64 4<br>

+  %3 = bitcast i32* %2 to <4 x i32>*<br>

+  %wide.load5 = load <4 x i32>, <4 x i32>* %3, align 16<br>

+  %4 = add nsw <4 x i32> %wide.load, %vec.phi<br>

+  %5 = add nsw <4 x i32> %wide.load5, %vec.phi4<br>

+  %index.next = add nuw nsw i64 %index, 8<br>

+  %6 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next<br>

+  %7 = bitcast i32* %6 to <4 x i32>*<br>

+  %wide.load.1 = load <4 x i32>, <4 x i32>* %7, align 16<br>

+  %8 = getelementptr i32, i32* %6, i64 4<br>

+  %9 = bitcast i32* %8 to <4 x i32>*<br>

+  %wide.load5.1 = load <4 x i32>, <4 x i32>* %9, align 16<br>

+  %10 = add nsw <4 x i32> %wide.load.1, %4<br>

+  %11 = add nsw <4 x i32> %wide.load5.1, %5<br>

+  %index.next.1 = add nsw i64 %index, 16<br>

+  %12 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.1<br>

+  %13 = bitcast i32* %12 to <4 x i32>*<br>

+  %wide.load.2 = load <4 x i32>, <4 x i32>* %13, align 16<br>

+  %14 = getelementptr i32, i32* %12, i64 4<br>

+  %15 = bitcast i32* %14 to <4 x i32>*<br>

+  %wide.load5.2 = load <4 x i32>, <4 x i32>* %15, align 16<br>

+  %16 = add nsw <4 x i32> %wide.load.2, %10<br>

+  %17 = add nsw <4 x i32> %wide.load5.2, %11<br>

+  %index.next.2 = add nsw i64 %index, 24<br>

+  %18 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.2<br>

+  %19 = bitcast i32* %18 to <4 x i32>*<br>

+  %wide.load.3 = load <4 x i32>, <4 x i32>* %19, align 16<br>

+  %20 = getelementptr i32, i32* %18, i64 4<br>

+  %21 = bitcast i32* %20 to <4 x i32>*<br>

+  %wide.load5.3 = load <4 x i32>, <4 x i32>* %21, align 16<br>

+  %22 = add nsw <4 x i32> %wide.load.3, %16<br>

+  %23 = add nsw <4 x i32> %wide.load5.3, %17<br>

+  %index.next.3 = add nsw i64 %index, 32<br>

+  %24 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.3<br>

+  %25 = bitcast i32* %24 to <4 x i32>*<br>

+  %wide.load.4 = load <4 x i32>, <4 x i32>* %25, align 16<br>

+  %26 = getelementptr i32, i32* %24, i64 4<br>

+  %27 = bitcast i32* %26 to <4 x i32>*<br>

+  %wide.load5.4 = load <4 x i32>, <4 x i32>* %27, align 16<br>

+  %28 = add nsw <4 x i32> %wide.load.4, %22<br>

+  %29 = add nsw <4 x i32> %wide.load5.4, %23<br>

+  %index.next.4 = add nsw i64 %index, 40<br>

+  %30 = icmp eq i64 %index.next.4, 1000<br>

+  br i1 %30, label %middle.block, label %vector.body<br>

+<br>

+middle.block:<br>

+  %.lcssa10 = phi <4 x i32> [ %29, %vector.body ]<br>

+  %.lcssa = phi <4 x i32> [ %28, %vector.body ]<br>

+  %bin.rdx = add <4 x i32> %.lcssa10, %.lcssa<br>

+  %rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef><br>

+  %bin.rdx6 = add <4 x i32> %bin.rdx, %rdx.shuf<br>

+  %rdx.shuf7 = shufflevector <4 x i32> %bin.rdx6, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef><br>

+  %bin.rdx8 = add <4 x i32> %bin.rdx6, %rdx.shuf7<br>

+  %31 = extractelement <4 x i32> %bin.rdx8, i32 0<br>

+  ret i32 %31<br>

+}<br>

+<br>

+define i32 @reduce_and() {<br>

+; CHECK-LABEL: reduce_and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+; CHECK:       Detected a reduction operation: {{.*}} and<br>

+<br>

+entry:<br>

+  br label %vector.body<br>

+<br>

+vector.body:<br>

+  %lsr.iv = phi i64 [ %lsr.iv.next, %vector.body ], [ -4096, %entry ]<br>

+  %vec.phi = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, %entry ], [ %6, %vector.body ]<br>

+  %vec.phi9 = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, %entry ], [ %7, %vector.body ]<br>

+  %uglygep33 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep3334 = bitcast i8* %uglygep33 to <4 x i32>*<br>

+  %scevgep35 = getelementptr <4 x i32>, <4 x i32>* %uglygep3334, i64 256<br>

+  %wide.load = load <4 x i32>, <4 x i32>* %scevgep35, align 16<br>

+  %scevgep36 = getelementptr <4 x i32>, <4 x i32>* %uglygep3334, i64 257<br>

+  %wide.load10 = load <4 x i32>, <4 x i32>* %scevgep36, align 16<br>

+  %0 = and <4 x i32> %wide.load, %vec.phi<br>

+  %1 = and <4 x i32> %wide.load10, %vec.phi9<br>

+  %uglygep30 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep3031 = bitcast i8* %uglygep30 to <4 x i32>*<br>

+  %scevgep32 = getelementptr <4 x i32>, <4 x i32>* %uglygep3031, i64 258<br>

+  %wide.load.1 = load <4 x i32>, <4 x i32>* %scevgep32, align 16<br>

+  %uglygep27 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep2728 = bitcast i8* %uglygep27 to <4 x i32>*<br>

+  %scevgep29 = getelementptr <4 x i32>, <4 x i32>* %uglygep2728, i64 259<br>

+  %wide.load10.1 = load <4 x i32>, <4 x i32>* %scevgep29, align 16<br>

+  %2 = and <4 x i32> %wide.load.1, %0<br>

+  %3 = and <4 x i32> %wide.load10.1, %1<br>

+  %uglygep24 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep2425 = bitcast i8* %uglygep24 to <4 x i32>*<br>

+  %scevgep26 = getelementptr <4 x i32>, <4 x i32>* %uglygep2425, i64 260<br>

+  %wide.load.2 = load <4 x i32>, <4 x i32>* %scevgep26, align 16<br>

+  %uglygep21 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep2122 = bitcast i8* %uglygep21 to <4 x i32>*<br>

+  %scevgep23 = getelementptr <4 x i32>, <4 x i32>* %uglygep2122, i64 261<br>

+  %wide.load10.2 = load <4 x i32>, <4 x i32>* %scevgep23, align 16<br>

+  %4 = and <4 x i32> %wide.load.2, %2<br>

+  %5 = and <4 x i32> %wide.load10.2, %3<br>

+  %uglygep18 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep1819 = bitcast i8* %uglygep18 to <4 x i32>*<br>

+  %scevgep20 = getelementptr <4 x i32>, <4 x i32>* %uglygep1819, i64 262<br>

+  %wide.load.3 = load <4 x i32>, <4 x i32>* %scevgep20, align 16<br>

+  %uglygep = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv<br>

+  %uglygep17 = bitcast i8* %uglygep to <4 x i32>*<br>

+  %scevgep = getelementptr <4 x i32>, <4 x i32>* %uglygep17, i64 263<br>

+  %wide.load10.3 = load <4 x i32>, <4 x i32>* %scevgep, align 16<br>

+  %6 = and <4 x i32> %wide.load.3, %4<br>

+  %7 = and <4 x i32> %wide.load10.3, %5<br>

+  %lsr.iv.next = add nsw i64 %lsr.iv, 128<br>

+  %8 = icmp eq i64 %lsr.iv.next, 0<br>

+  br i1 %8, label %middle.block, label %vector.body<br>

+<br>

+middle.block:<br>

+  %bin.rdx = and <4 x i32> %7, %6<br>

+  %rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef><br>

+  %bin.rdx11 = and <4 x i32> %bin.rdx, %rdx.shuf<br>

+  %rdx.shuf12 = shufflevector <4 x i32> %bin.rdx11, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef><br>

+  %bin.rdx13 = and <4 x i32> %bin.rdx11, %rdx.shuf12<br>

+  %9 = extractelement <4 x i32> %bin.rdx13, i32 0<br>

+  ret i32 %9<br>

+}<br>

+<br>

+define float @reduce_add_float(float* nocapture readonly %a) {<br>

+; CHECK-LABEL: reduce_add_float<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+; CHECK:       Detected a reduction operation: {{.*}} fadd fast<br>

+;<br>

+entry:<br>

+  br label %vector.body<br>

+<br>

+vector.body:<br>

+  %index = phi i64 [ 0, %entry ], [ %index.next.4, %vector.body ]<br>

+  %vec.phi = phi <4 x float> [ zeroinitializer, %entry ], [ %28, %vector.body ]<br>

+  %vec.phi9 = phi <4 x float> [ zeroinitializer, %entry ], [ %29, %vector.body ]<br>

+  %0 = getelementptr inbounds float, float* %a, i64 %index<br>

+  %1 = bitcast float* %0 to <4 x float>*<br>

+  %wide.load = load <4 x float>, <4 x float>* %1, align 4<br>

+  %2 = getelementptr float, float* %0, i64 4<br>

+  %3 = bitcast float* %2 to <4 x float>*<br>

+  %wide.load10 = load <4 x float>, <4 x float>* %3, align 4<br>

+  %4 = fadd fast <4 x float> %wide.load, %vec.phi<br>

+  %5 = fadd fast <4 x float> %wide.load10, %vec.phi9<br>

+  %index.next = add nuw nsw i64 %index, 8<br>

+  %6 = getelementptr inbounds float, float* %a, i64 %index.next<br>

+  %7 = bitcast float* %6 to <4 x float>*<br>

+  %wide.load.1 = load <4 x float>, <4 x float>* %7, align 4<br>

+  %8 = getelementptr float, float* %6, i64 4<br>

+  %9 = bitcast float* %8 to <4 x float>*<br>

+  %wide.load10.1 = load <4 x float>, <4 x float>* %9, align 4<br>

+  %10 = fadd fast <4 x float> %wide.load.1, %4<br>

+  %11 = fadd fast <4 x float> %wide.load10.1, %5<br>

+  %index.next.1 = add nsw i64 %index, 16<br>

+  %12 = getelementptr inbounds float, float* %a, i64 %index.next.1<br>

+  %13 = bitcast float* %12 to <4 x float>*<br>

+  %wide.load.2 = load <4 x float>, <4 x float>* %13, align 4<br>

+  %14 = getelementptr float, float* %12, i64 4<br>

+  %15 = bitcast float* %14 to <4 x float>*<br>

+  %wide.load10.2 = load <4 x float>, <4 x float>* %15, align 4<br>

+  %16 = fadd fast <4 x float> %wide.load.2, %10<br>

+  %17 = fadd fast <4 x float> %wide.load10.2, %11<br>

+  %index.next.2 = add nsw i64 %index, 24<br>

+  %18 = getelementptr inbounds float, float* %a, i64 %index.next.2<br>

+  %19 = bitcast float* %18 to <4 x float>*<br>

+  %wide.load.3 = load <4 x float>, <4 x float>* %19, align 4<br>

+  %20 = getelementptr float, float* %18, i64 4<br>

+  %21 = bitcast float* %20 to <4 x float>*<br>

+  %wide.load10.3 = load <4 x float>, <4 x float>* %21, align 4<br>

+  %22 = fadd fast <4 x float> %wide.load.3, %16<br>

+  %23 = fadd fast <4 x float> %wide.load10.3, %17<br>

+  %index.next.3 = add nsw i64 %index, 32<br>

+  %24 = getelementptr inbounds float, float* %a, i64 %index.next.3<br>

+  %25 = bitcast float* %24 to <4 x float>*<br>

+  %wide.load.4 = load <4 x float>, <4 x float>* %25, align 4<br>

+  %26 = getelementptr float, float* %24, i64 4<br>

+  %27 = bitcast float* %26 to <4 x float>*<br>

+  %wide.load10.4 = load <4 x float>, <4 x float>* %27, align 4<br>

+  %28 = fadd fast <4 x float> %wide.load.4, %22<br>

+  %29 = fadd fast <4 x float> %wide.load10.4, %23<br>

+  %index.next.4 = add nsw i64 %index, 40<br>

+  %30 = icmp eq i64 %index.next.4, 1000<br>

+  br i1 %30, label %middle.block, label %vector.body<br>

+<br>

+middle.block:<br>

+  %.lcssa15 = phi <4 x float> [ %29, %vector.body ]<br>

+  %.lcssa = phi <4 x float> [ %28, %vector.body ]<br>

+  %bin.rdx = fadd fast <4 x float> %.lcssa15, %.lcssa<br>

+  %rdx.shuf = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef><br>

+  %bin.rdx11 = fadd fast <4 x float> %bin.rdx, %rdx.shuf<br>

+  %rdx.shuf12 = shufflevector <4 x float> %bin.rdx11, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef><br>

+  %bin.rdx13 = fadd fast <4 x float> %bin.rdx11, %rdx.shuf12<br>

+  %31 = extractelement <4 x float> %bin.rdx13, i32 0<br>

+  ret float %31<br>

+}<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote>

<br>

</div></div></blockquote></div><br></div></div>