<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 06/20/2017 07:21 PM, hameeza ahmed
      via llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CAFMPKeZbyciXPKA6SuU09r9CXTX4J0twdjSESCKTT06HUmPoxA@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Hello,
        <div><br>
        </div>
        <div>I am using llvm  on my core i7 laptop which has no avx
          support.</div>
        <div><br>
        </div>
        <div>my goal is to generate avx512 code (loop vectorization) for
           Knight landing/skylake .  </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>my .c code is;</div>
        <div><br>
        </div>
        <div>
          <div>int a[256], b[256], c[256];</div>
          <div>foo () {</div>
        </div>
      </div>
    </blockquote>
    <br>
    void foo() {<br>
    <br>
    <blockquote
cite="mid:CAFMPKeZbyciXPKA6SuU09r9CXTX4J0twdjSESCKTT06HUmPoxA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>int i;</div>
          <div>for (i=0; i<256; i++) {</div>
          <div>a[i] = b[i] + c[i];</div>
          <div>}</div>
          <div>}</div>
        </div>
        <div><br>
        </div>
        <div>i first generated its .ll file via clang </div>
        <div><br>
        </div>
        <div>clang -S  -emit-llvm test.c -o test.ll<br>
        </div>
      </div>
    </blockquote>
    <br>
    Your problem is that vectorization happens in opt, not in llc.
    Telling llc that you wish to enable AVX-512 is not sufficient. In
    fact, if you run:<br>
    <br>
    clang -S -o - test.c -march=knl -O3<br>
    <br>
    you'll see AVX-512 vectorized code. If you want to run opt
    separately to generate the vectorized code, you need to tell it that
    it is targeting the KNL. Clang can add the necessary function
    attributes to do this. You'll also want to run clang with
    optimizations enabled so that it will generate IR that is intended
    to be optimized, even if you then disable the actual optimizaitons
    to get the pre-opt IR.<br>
    <br>
    clang  -S -emit-llvm test.c -march=knl -O3 -mllvm
    -disable-llvm-optzns<br>
    <br>
    then running opt as you have it below should produce the desired
    result.<br>
    <br>
    Finally, I recommend upgrading to Clang/LLVM 4.0. It produces better
    AVX-512 code than 3.9 did.<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
cite="mid:CAFMPKeZbyciXPKA6SuU09r9CXTX4J0twdjSESCKTT06HUmPoxA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>then i optimized it;</div>
        <div><br>
        </div>
        <div>opt -S -O3 test.ll -o test_o3.ll<br>
        </div>
        <div><br>
        </div>
        <div>then i used llc for code generation</div>
        <div><br>
        </div>
        <div>llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o
          test_o3.s<br>
        </div>
        <div><br>
        </div>
        <div>llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s<br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>here is my generated code;</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>
          <div><span style="white-space:pre">     </span>.text</div>
          <div><span style="white-space:pre">     </span>.file<span style="white-space:pre">        </span>"filer_o3.ll"</div>
          <div><span style="white-space:pre">     </span>.globl<span style="white-space:pre">       </span>foo</div>
          <div><span style="white-space:pre">     </span>.p2align<span style="white-space:pre">     </span>4,
            0x90</div>
          <div><span style="white-space:pre">     </span>.type<span style="white-space:pre">        </span>foo,@function</div>
          <div>foo:                                    # @foo</div>
          <div><span style="white-space:pre">     </span>.cfi_startproc</div>
          <div># BB#0:                                 #
            %min.iters.checked</div>
          <div><span style="white-space:pre">     </span>pushq<span style="white-space:pre">        </span>%rbp</div>
          <div>.Ltmp0:</div>
          <div><span style="white-space:pre">     </span>.cfi_def_cfa_offset
            16</div>
          <div>.Ltmp1:</div>
          <div><span style="white-space:pre">     </span>.cfi_offset %rbp,
            -16</div>
          <div><span style="white-space:pre">     </span>movq<span style="white-space:pre"> </span>%rsp,
            %rbp</div>
          <div>.Ltmp2:</div>
          <div><span style="white-space:pre">     </span>.cfi_def_cfa_register
            %rbp</div>
          <div><span style="white-space:pre">     </span>movq<span style="white-space:pre"> </span>$-1024,
            %rax            # imm = 0xFC00</div>
          <div><span style="white-space:pre">     </span>.p2align<span style="white-space:pre">     </span>4,
            0x90</div>
          <div>.<b><font color="#0000ff">LBB0_1:                        
                       # %vector.body</font></b></div>
          <div><b><font color="#0000ff">                               
                        # =>This Inner Loop Header: Depth=1</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>c+1024(%rax),
                %xmm0</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>c+1040(%rax),
                %xmm1</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vpaddd<span style="white-space:pre">       </span>b+1024(%rax),
                %xmm0, %xmm0</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vpaddd<span style="white-space:pre">       </span>b+1040(%rax),
                %xmm1, %xmm1</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>%xmm0,
                a+1024(%rax)</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>%xmm1,
                a+1040(%rax)</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>c+1056(%rax),
                %xmm0</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>c+1072(%rax),
                %xmm1</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vpaddd<span style="white-space:pre">       </span>b+1056(%rax),
                %xmm0, %xmm0</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vpaddd<span style="white-space:pre">       </span>b+1072(%rax),
                %xmm1, %xmm1</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>%xmm0,
                a+1056(%rax)</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>vmovdqa32<span style="white-space:pre">    </span>%xmm1,
                a+1072(%rax)</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>addq<span style="white-space:pre"> </span>$64,
                %rax</font></b></div>
          <div><b><font color="#0000ff"><span style="white-space:pre">      </span>jne<span style="white-space:pre">  </span>.LBB0_1</font></b></div>
          <div># BB#2:                                 # %middle.block</div>
          <div><span style="white-space:pre">     </span>popq<span style="white-space:pre"> </span>%rbp</div>
          <div><span style="white-space:pre">     </span>retq</div>
          <div>.Lfunc_end0:</div>
          <div><span style="white-space:pre">     </span>.size<span style="white-space:pre">        </span>foo,
            .Lfunc_end0-foo</div>
          <div><span style="white-space:pre">     </span>.cfi_endproc</div>
          <div><br>
          </div>
          <div><span style="white-space:pre">     </span>.type<span style="white-space:pre">        </span>b,@object
                          # @b</div>
          <div><span style="white-space:pre">     </span>.comm<span style="white-space:pre">        </span>b,1024,16</div>
          <div><span style="white-space:pre">     </span>.type<span style="white-space:pre">        </span>c,@object
                          # @c</div>
          <div><span style="white-space:pre">     </span>.comm<span style="white-space:pre">        </span>c,1024,16</div>
          <div><span style="white-space:pre">     </span>.type<span style="white-space:pre">        </span>a,@object
                          # @a</div>
          <div><span style="white-space:pre">     </span>.comm<span style="white-space:pre">        </span>a,1024,16</div>
          <div><br>
          </div>
          <div><span style="white-space:pre">     </span>.ident<span style="white-space:pre">       </span>"clang
            version 3.9.0 (tags/RELEASE_390/final)"</div>
          <div><span style="white-space:pre">     </span>.section<span style="white-space:pre">     </span>".note.GNU-stack","",@progbits</div>
        </div>
        <div><br>
        </div>
        <div>in the generated code although there is use of vmov...
          instructions but no zmm register? only xmm registers.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Can you please specify where i am wrong. i have tried it
          several times by different parameters but always get xmm
          registers.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Thank You</div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>