[llvm-dev] [LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable

Thu Aug 6 11:34:59 PDT 2015

Inlined.

From:	Gerolf Hoflehner <ghoflehner at apple.com>
To:	Hyojin Sung/Watson/IBM at IBMUS
Cc:	Hal Finkel <hfinkel at anl.gov>, llvm-dev at lists.llvm.org
Date:	08/06/2015 12:11 AM
Subject:	Re: [LLVMdev] Improving loop vectorizer support for loops with
            a volatile iteration variable

      On Aug 5, 2015, at 9:08 PM, Gerolf Hoflehner <ghoflehner at apple.com>
      wrote:

      Ok, assuming a verifier/checker was in place the information could
      look like:

      ...
      LV: Not vectorizing: Cannot prove legality.
      LV: Legal to vectorize. — after loop rotation
      LV: Legal to vectorize.
      ...
      LV: Legal to vectorize.
      LV: Not vectorizing: Cannot prove legality. — after Jump Threading
      (later also — after CFG Simplify)

      Note that in your example the second round of loop rotation fails
      making a loop vectorizable (from the comment that is its purpose,
      though!). It might be more appropriate to generalize that approach to
      any transformation that makes a loop vectorizable (not just rotation)
      rather than tweaking jump threading + simplification.

      To clarify, the loop is vectorizable before and after loop rotation.
      At least for the example, loop rotation does not affect the
      vectorizability of a loop. The LLVM loop vectorizer and vectorization
      legality checker rely on loops in a canonical form. So if there's no
      interfering passes, a check should be able to tell if a loop is
      vectorizable after loop simplify.

      I’m convinced there are more cases like yours and the following
      framework would give a systematic way to handle them:

      a) a checker for is_vectorizable (from refactoring the legal class in
      the loop vectorizer - but properly, not hacked like my prototype …)
      b) invoke checker optionally. The checker can be used for testing,
      analysis and - independently - as a service for loop transformations
      (eg. if the loop is vectorizable already, perhaps a transformation is
      not needed or vice versa). I’m not sure about the architecture yet.
      Perhaps the pass manager could run a checker automatically after
      every pass when an option is set. Adding extra calls after every pass
      is too clumsy.

      c) fix violators if appropriate and/or add transformations that morph
      a loop into a vectorizable form when possible (especially when it has
      been in vectorizable shape at some point).

      We can design a more general form of checker/verifier that is
      optionally executed after LLVM passes can help with identifying
      interfering transformations. We can use the checker only for
      debugging (reporting which transformation is interfering with others)
      or for dynamically triggering or reverting transformations to
      maintain desired properties like vectorizability as you said. If the
      checker dynamically affects actual pass executions, then I think it
      can require extensive changes to the incremental way LLVM applies
      transformations.

      Regards,
      Hyojin

      -Gerolf

            On Aug 4, 2015, at 8:04 PM, Gerolf Hoflehner <
            ghoflehner at apple.com> wrote:

            The approach would be to run the legal check before + after
            transformations on demand. For loops that “should” be
            vectorized it would give the means to isolate the
            transformation that introduces a barrier. I’ll see that can get
            a dirty prototype to illustrate this on your test case.  I
            could give a warning/remark when the code was (loop-)
            vectorizable at some point but not anymore when the vectorizer
            actually kicks in and speed-up diagnosis when comparing
            good/bad cases (like in your instance w/ and w/o the volatile
            modifier).

            Gerolf

                  On Aug 4, 2015, at 8:40 AM, Hyojin Sung <hsung at us.ibm.com
                  > wrote:

                  Hi Gerolf,

                  Thanks for the comment. Yes, I agree that compiler
                  transformation as a whole should not render vectorizable
                  loops unvectorizable. However, some passes may
                  temporarily make a loop unvectorizable (e.g., by
                  aggressive GVN or DCE/DBE) then later passes may revert
                  the effect. Trying to keep the invariants after each pass
                  may prevent useful optimizations. Regarding your concern
                  about the example-based approach,
                  LoopVectorizationLegality class gives a good idea about
                  what will be such invariants and they are mostly related
                  to check for canonical loop structures. I think that
                  there are only a handful of transformation passes that
                  may disrupt loop structures mainly through aggressively
                  eliminating BB's. Focusing on such transformations may be
                  sufficient to prevent prior transformations from
                  destroying vectorizable loops. I'd be glad to hear more
                  of your thoughts.

                  Best,
                  Hyojin

                  <graycol.gif>Gerolf Hoflehner ---08/03/2015 06:14:22
                  PM---I see a more fundamental problem and perhaps this
                  example can serve as a stepping stone towards a so

                  From: Gerolf Hoflehner <ghoflehner at apple.com>
                  To: Hyojin Sung/Watson/IBM at IBMUS
                  Cc: Hal Finkel <hfinkel at anl.gov>,
                  llvmdev-bounces at cs.uiuc.edu, llvmdev at cs.uiuc.edu
                  Date: 08/03/2015 06:14 PM
                  Subject: Re: [LLVMdev] Improving loop vectorizer support
                  for loops with a volatile iteration variable

                  I see a more fundamental problem and perhaps this example
                  can serve as a stepping stone towards a solution.

                  There is a desired property: In this case, loop is
                  vectorizable.
                  There are N compiler transformations. These
                  transformations must either establish the property or
                  keep it invariant,  but never destroy it.

                  If there is agreement to this then the first step is to
                  have the analysis of ‘is vectorizable’ runnable after
                  every transformation and report violations in detail,
                  likely under a flag. Then comparing a set of loops not
                  vectorizable with clang but with other compilers (gcc,
                  clang, …) should shape precise ideas on normal forms for
                  the vectorizer and general improvements to
                  transformations to the keep/establish the
                  invariant/desired property.

                  I’m worried that the one example at time approach over
                  time confuscates matters within transformations resulting
                  in less maintainable code.

                  Gerolf

                        On Aug 3, 2015, at 2:33 PM, Hyojin Sung <
                        hsung at us.ibm.com> wrote:

                        Hi,

                        I discussed the issue offline with Hal, and would
                        like to clarify what is exactly going on, what are
                        trade-offs for different solutions, and ask for
                        more feedback on my proposed solution (
                        http://reviews.llvm.org/D11728). I will use the
                        example from Hal's post:

                        void foo2(float * restrict x, float * restrict y,
                        float * restrict z) {
                         for (volatile int i = 0; i < 1000; ++i) {
                           for (int j = 0; j < 1600; ++j) {
                             x[j] = y[j] + z[j];
                           }
                         }
                        }

                        IR after the first loop simplify: A preheader is
                        created.

                        ; Function Attrs: nounwind
                        define void @foo2(float* noalias nocapture %x,
                        float* noalias nocapture readonly %y, float*
                        noalias nocapture readonly %z) #0 {
                        entry:
                         %i = alloca i32, align 4
                         tail call void @llvm.dbg.value(metadata float* %x,
                        i64 0, metadata !11, metadata !25), !dbg !26
                         tail call void @llvm.dbg.value(metadata float* %y,
                        i64 0, metadata !12, metadata !25), !dbg !27
                         tail call void @llvm.dbg.value(metadata float* %z,
                        i64 0, metadata !13, metadata !25), !dbg !28
                         %i.0.i.0..sroa_cast = bitcast i32* %i to i8*
                         call void @llvm.lifetime.start(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         tail call void @llvm.dbg.value(metadata i32 0, i64
                        0, metadata !14, metadata !25), !dbg !29
                         store volatile i32 0, i32* %i, align 4, !dbg !29
                         br label %for.cond, !dbg !30

                        for.cond:                                         ;
                        preds = %for.cond.cleanup.3, %entry
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0. = load volatile i32, i32* %i, align
                        4, !dbg !31
                         %cmp = icmp slt i32 %i.0.i.0., 1000, !dbg !34
                         br i1 %cmp, label %for.cond.1.preheader, label
                        %for.cond.cleanup, !dbg !35

                        for.cond.1.preheader:                             ;
                        preds = %for.cond
                         br label %for.cond.1, !dbg !36

                        for.cond.cleanup:                                 ;
                        preds = %for.cond
                         call void @llvm.lifetime.end(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         ret void, !dbg !38

                        for.cond.1:                                       ;
                        preds = %for.cond.1.preheader, %for.body.4
                         %j.0 = phi i32 [ %inc, %for.body.4 ], [ 0,
                        %for.cond.1.preheader ]
                         %cmp2 = icmp slt i32 %j.0, 1600, !dbg !36
                         br i1 %cmp2, label %for.body.4, label
                        %for.cond.cleanup.3, !dbg !39

                        for.cond.cleanup.3:                               ;
                        preds = %for.cond.1
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0.17 = load volatile i32, i32* %i, align
                        4, !dbg !40
                         %inc10 = add nsw i32 %i.0.i.0.17, 1, !dbg !40
                         tail call void @llvm.dbg.value(metadata i32
                        %inc10, i64 0, metadata !14,
                        metadata !25), !dbg !29
                         store volatile i32 %inc10, i32* %i, align
                        4, !dbg !40
                         br label %for.cond, !dbg !41

                        for.body.4:                                       ;
                        preds = %for.cond.1
                         %idxprom = sext i32 %j.0 to i64, !dbg !42
                         %arrayidx = getelementptr inbounds float, float*
                        %y, i64 %idxprom, !dbg !42
                         %0 = load float, float* %arrayidx, align
                        4, !dbg !42, !tbaa !44
                         %arrayidx6 = getelementptr inbounds float, float*
                        %z, i64 %idxprom, !dbg !48
                         %1 = load float, float* %arrayidx6, align
                        4, !dbg !48, !tbaa !44
                         %add = fadd float %0, %1, !dbg !49
                         %arrayidx8 = getelementptr inbounds float, float*
                        %x, i64 %idxprom, !dbg !50
                         store float %add, float* %arrayidx8, align
                        4, !dbg !51, !tbaa !44
                         %inc = add nsw i32 %j.0, 1, !dbg !52
                         tail call void @llvm.dbg.value(metadata i32 %inc,
                        i64 0, metadata !18, metadata !25), !dbg !53
                         br label %for.cond.1, !dbg !54
                        }

                        IR after loop rotation: After loop rotation, a
                        rotated preheader (for.cond.1.preheader.lr.ph) is
                        created. A test for (i < 1000) is added at the end
                        of "entry" block. If true, the control jumps
                        unconditionally to "for.body.4" through
                        "for.cond.1.preheader.lr.ph" and
                        "for.cond.1.preheader". You can see that these two
                        blocks ("for.cond.1.preheader.lr.ph" and
                        "for.cond.1.preheader") are practically empty, and
                        they will get eliminated later by Jump Threading
                        and/or Simplify-the-CFG. *IF* the outer loop has a
                        non-volatile induction variable, the loop will not
                        be rotated in the first place as
                        "for.cond.1.preheader" has a PHI node for "i", and
                        these blocks will not be eliminated.

                        ; Function Attrs: nounwind
                        define void @foo2(float* noalias nocapture %x,
                        float* noalias nocapture readonly %y, float*
                        noalias nocapture readonly %z) #0 {
                        entry:
                         %i = alloca i32, align 4
                         tail call void @llvm.dbg.value(metadata float* %x,
                        i64 0, metadata !11, metadata !25), !dbg !26
                         tail call void @llvm.dbg.value(metadata float* %y,
                        i64 0, metadata !12, metadata !25), !dbg !27
                         tail call void @llvm.dbg.value(metadata float* %z,
                        i64 0, metadata !13, metadata !25), !dbg !28
                         %i.0.i.0..sroa_cast = bitcast i32* %i to i8*
                         call void @llvm.lifetime.start(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         tail call void @llvm.dbg.value(metadata i32 0, i64
                        0, metadata !14, metadata !25), !dbg !29
                         store volatile i32 0, i32* %i, align 4, !dbg !29
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0..21 = load volatile i32, i32* %i, align
                        4, !dbg !30
                         %cmp.22 = icmp slt i32 %i.0.i.0..21,
                        1000, !dbg !33
                         br i1 %cmp.22, label %for.cond.1.preheader.lr.ph,
                        label %for.cond.cleanup, !dbg !34

                        for.cond.1.preheader.lr.ph:                       ;
                        preds = %entry
                         br label %for.cond.1.preheader, !dbg !34

                        for.cond.1.preheader:                             ;
                        preds = %for.cond.1.preheader.lr.ph,
                        %for.cond.cleanup.3
                         br label %for.body.4, !dbg !35

                        for.cond.for.cond.cleanup_crit_edge:              ;
                        preds = %for.cond.cleanup.3
                         br label %for.cond.cleanup, !dbg !34

                        for.cond.cleanup:                                 ;
                        preds = %for.cond.for.cond.cleanup_crit_edge,
                        %entry
                         call void @llvm.lifetime.end(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         ret void, !dbg !36

                        for.cond.cleanup.3:                               ;
                        preds = %for.body.4
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0.17 = load volatile i32, i32* %i, align
                        4, !dbg !37
                         %inc10 = add nsw i32 %i.0.i.0.17, 1, !dbg !37
                         tail call void @llvm.dbg.value(metadata i32
                        %inc10, i64 0, metadata !14,
                        metadata !25), !dbg !29
                         store volatile i32 %inc10, i32* %i, align
                        4, !dbg !37
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0. = load volatile i32, i32* %i, align
                        4, !dbg !30
                         %cmp = icmp slt i32 %i.0.i.0., 1000, !dbg !33
                         br i1 %cmp, label %for.cond.1.preheader, label
                        %for.cond.for.cond.cleanup_crit_edge, !dbg !34

                        for.body.4:                                       ;
                        preds = %for.cond.1.preheader, %for.body.4
                         %j.020 = phi i32 [ 0, %for.cond.1.preheader ],
                        [ %inc, %for.body.4 ]
                         %idxprom = sext i32 %j.020 to i64, !dbg !38
                         %arrayidx = getelementptr inbounds float, float*
                        %y, i64 %idxprom, !dbg !38
                         %0 = load float, float* %arrayidx, align
                        4, !dbg !38, !tbaa !41
                         %arrayidx6 = getelementptr inbounds float, float*
                        %z, i64 %idxprom, !dbg !45
                         %1 = load float, float* %arrayidx6, align
                        4, !dbg !45, !tbaa !41
                         %add = fadd float %0, %1, !dbg !46
                         %arrayidx8 = getelementptr inbounds float, float*
                        %x, i64 %idxprom, !dbg !47
                         store float %add, float* %arrayidx8, align
                        4, !dbg !48, !tbaa !41
                         %inc = add nsw i32 %j.020, 1, !dbg !49
                         tail call void @llvm.dbg.value(metadata i32 %inc,
                        i64 0, metadata !18, metadata !25), !dbg !50
                         %cmp2 = icmp slt i32 %inc, 1600, !dbg !51
                         br i1 %cmp2, label %for.body.4, label
                        %for.cond.cleanup.3, !dbg !35

                        After Jump Threading: "for.cond.1.preheader.lr.ph"
                        and "for.cond.1.preheader" are merged into
                        "for.body.4" by
                        TryToSimplifyUnconditionalBranchFromEmptyBlock() in
                        Transforms/Utils/Local.cpp. Now "for.body.4" has
                        three incoming edges (two backedges).

                        ; Function Attrs: nounwind
                        define void @foo2(float* noalias nocapture %x,
                        float* noalias nocapture readonly %y, float*
                        noalias nocapture readonly %z) #0 {
                        entry:
                         %i = alloca i32, align 4
                         tail call void @llvm.dbg.value(metadata float* %x,
                        i64 0, metadata !11, metadata !25), !dbg !26
                         tail call void @llvm.dbg.value(metadata float* %y,
                        i64 0, metadata !12, metadata !25), !dbg !27
                         tail call void @llvm.dbg.value(metadata float* %z,
                        i64 0, metadata !13, metadata !25), !dbg !28
                         %i.0.i.0..sroa_cast = bitcast i32* %i to i8*
                         call void @llvm.lifetime.start(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         tail call void @llvm.dbg.value(metadata i32 0, i64
                        0, metadata !14, metadata !25), !dbg !29
                         store volatile i32 0, i32* %i, align 4, !dbg !29
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0..21 = load volatile i32, i32* %i, align
                        4, !dbg !30
                         %cmp.22 = icmp slt i32 %i.0.i.0..21,
                        1000, !dbg !33
                         br i1 %cmp.22, label %for.body.4, label
                        %for.cond.cleanup, !dbg !34

                        for.cond.cleanup:                                 ;
                        preds = %for.cond.cleanup.3, %entry
                         call void @llvm.lifetime.end(i64 4, i8*
                        %i.0.i.0..sroa_cast)
                         ret void, !dbg !35

                        for.cond.cleanup.3:                               ;
                        preds = %for.body.4
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0.17 = load volatile i32, i32* %i, align
                        4, !dbg !36
                         %inc10 = add nsw i32 %i.0.i.0.17, 1, !dbg !36
                         tail call void @llvm.dbg.value(metadata i32
                        %inc10, i64 0, metadata !14,
                        metadata !25), !dbg !29
                         store volatile i32 %inc10, i32* %i, align
                        4, !dbg !36
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0. = load volatile i32, i32* %i, align
                        4, !dbg !30
                         %cmp = icmp slt i32 %i.0.i.0., 1000, !dbg !33
                         br i1 %cmp, label %for.body.4, label
                        %for.cond.cleanup, !dbg !34

                        for.body.4:                                       ;
                        preds = %for.cond.cleanup.3, %entry, %for.body.4
                         %indvars.iv = phi i64 [ %indvars.iv.next,
                        %for.body.4 ], [ 0, %entry ], [ 0,
                        %for.cond.cleanup.3 ]
                         %arrayidx = getelementptr inbounds float, float*
                        %y, i64 %indvars.iv, !dbg !37
                         %0 = load float, float* %arrayidx, align
                        4, !dbg !37, !tbaa !40
                         %arrayidx6 = getelementptr inbounds float, float*
                        %z, i64 %indvars.iv, !dbg !44
                         %1 = load float, float* %arrayidx6, align
                        4, !dbg !44, !tbaa !40
                         %add = fadd float %0, %1, !dbg !45
                         %arrayidx8 = getelementptr inbounds float, float*
                        %x, i64 %indvars.iv, !dbg !46
                         store float %add, float* %arrayidx8, align
                        4, !dbg !47, !tbaa !40
                         %indvars.iv.next = add nuw nsw i64 %indvars.iv,
                        1, !dbg !48
                         %exitcond = icmp eq i64 %indvars.iv.next,
                        1600, !dbg !48
                         br i1 %exitcond, label %for.cond.cleanup.3, label
                        %for.body.4, !dbg !48

                        After another loop simplify: Loop simplify tries to
                        separate out nested loops but fails to do so with
                        this loop since it does not has a PHI node for the
                        outer loop variable. Instead, it creates a backedge
                        block.

                        for.cond.cleanup.3:                               ;
                        preds = %for.body.4
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0.17 = load volatile i32, i32* %i, align
                        4, !dbg !39
                         %inc10 = add nsw i32 %i.0.i.0.17, 1, !dbg !39
                         tail call void @llvm.dbg.value(metadata i32
                        %inc10, i64 0, metadata !14,
                        metadata !25), !dbg !29
                         store volatile i32 %inc10, i32* %i, align
                        4, !dbg !39
                         tail call void @llvm.dbg.value(metadata i32* %i,
                        i64 0, metadata !14, metadata !25), !dbg !29
                         %i.0.i.0. = load volatile i32, i32* %i, align
                        4, !dbg !30
                         %cmp = icmp slt i32 %i.0.i.0., 1000, !dbg !33
                         br i1 %cmp, label %for.body.4.backedge, label
                        %for.cond.cleanup.loopexit, !dbg !34

                        for.body.4:                                       ;
                        preds = %for.body.4.backedge, %for.body.4.preheader
                         %indvars.iv = phi i64 [ 0,
                        %for.body.4.preheader ], [ %indvars.iv.be,
                        %for.body.4.backedge ]
                         %arrayidx = getelementptr inbounds float, float*
                        %y, i64 %indvars.iv, !dbg !35
                         %0 = load float, float* %arrayidx, align
                        4, !dbg !35, !tbaa !40
                         %arrayidx6 = getelementptr inbounds float, float*
                        %z, i64 %indvars.iv, !dbg !44
                         %1 = load float, float* %arrayidx6, align
                        4, !dbg !44, !tbaa !40
                         %add = fadd float %0, %1, !dbg !45
                         %arrayidx8 = getelementptr inbounds float, float*
                        %x, i64 %indvars.iv, !dbg !46
                         store float %add, float* %arrayidx8, align
                        4, !dbg !47, !tbaa !40
                         %indvars.iv.next = add nuw nsw i64 %indvars.iv,
                        1, !dbg !48
                         %exitcond = icmp eq i64 %indvars.iv.next,
                        1600, !dbg !48
                         br i1 %exitcond, label %for.cond.cleanup.3, label
                        %for.body.4.backedge, !dbg !48

                        for.body.4.backedge:                              ;
                        preds = %for.body.4, %for.cond.cleanup.3
                         %indvars.iv.be = phi i64 [ %indvars.iv.next,
                        %for.body.4 ], [ 0, %for.cond.cleanup.3 ]
                         br label %for.body.4x

                        LLVM  loop vectorizer rejects to vectorize any loop
                        for which a loop latch (for.body.4.backedge) is
                        different from a loop exiting block
                        (for.cond.cleanup.3). The loop vectorizer can
                        assume that all instructions in the loop are
                        executed the same number of times with the test.

                        I believe a fix is in order in one way or another
                        because the example is simple and common enough and
                        vectorized by other compilers. We may approach it
                        by either (1) preventing loops from being collapsed
                        in the first place or (2) teaching loop vectorizer
                        to handle collapsed loops. For (2), we may need to
                        allow loop vectorizer to forego the assumption and
                        handle the loop as it is. The assumption seems
                        fundamental to many of the vectorization
                        algorithms, so it will require extensive updates or
                        may end up with reverting the loop back to a
                        properly nested form. The downside of (1) is that
                        it may slow down common optimization passes that
                        are repeatedly executed before vectorization.

                        My patch (http://reviews.llvm.org/D11728) is a
                        prototype fix for (1) that modifies Jump Threading
                        and Simplify-the-CFG to not eliminate an empty loop
                        header BB even when the loop does not have a PHI
                        node for its induction variable. The details can be
                        found at http://reviews.llvm.org/D11728. I would
                        welcome and appreciate any comments or feedback.

                        Best,
                        Hyojin

                        <graycol.gif>Hal Finkel ---07/16/2015 03:19:24
                        AM-------- Original Message ----- > From: "Hal
                        Finkel" <hfinkel at anl.gov>

                        From: Hal Finkel <hfinkel at anl.gov>
                        To: Chandler Carruth <chandlerc at google.com>
                        Cc: llvmdev at cs.uiuc.edu
                        Date: 07/16/2015 03:19 AM
                        Subject: Re: [LLVMdev] Improving loop vectorizer
                        support for loops with a volatile iteration
                        variable
                        Sent by: llvmdev-bounces at cs.uiuc.edu

                              From: "Hal Finkel" <hfinkel at anl.gov>
                              To: "Chandler Carruth" <chandlerc at google.com>
                              Cc: llvmdev at cs.uiuc.edu
                              Sent: Thursday, July 16, 2015 1:58:02 AM
                              Subject: Re: [LLVMdev] Improving loop
                              vectorizer support for loops with a volatile
                              iteration variable

                                    From: "Hal Finkel" <hfinkel at anl.gov>
                                    To: "Chandler Carruth" <
                                    chandlerc at google.com>
                                    Cc: llvmdev at cs.uiuc.edu
                                    Sent: Thursday, July 16, 2015 1:46:42
                                    AM
                                    Subject: Re: [LLVMdev] Improving loop
                                    vectorizer support for loops with a
                                    volatile iteration variable

                                          From: "Chandler Carruth" <
                                          chandlerc at google.com>
                                          To: "Hal Finkel" <hfinkel at anl.gov
                                          >
                                          Cc: "Hyojin Sung" <
                                          hsung at us.ibm.com>,
                                          llvmdev at cs.uiuc.edu
                                          Sent: Thursday, July 16, 2015
                                          1:06:03 AM
                                          Subject: Re: [LLVMdev] Improving
                                          loop vectorizer support for loops
                                          with a volatile iteration
                                          variable

                                          On Wed, Jul 15, 2015 at 6:36 PM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150806/87b3e92d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150806/87b3e92d/attachment-0001.gif>