[llvm-dev] unable to emit vectorized code in LLVM IR

Thu Aug 17 11:56:26 PDT 2017

By accessing only argv[1] and argv[2], you only took 2 numbers from the
command line as input and added them together over and over again. You need
to open a file and read nubmers from it or access more command line
parameters.

~Craig

On Thu, Aug 17, 2017 at 11:51 AM, Francois Fayard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Try that:
>
> void f(int* a, int* b, int* c, int n) {
>   for (int i = 0; i < n; ++i) {
>     c[i] = a[i] + b[i];
>   }
> }
>
> and compile with: clang++ -S -O3 -mavx2 a.cpp -o a.assembly
> and look at the a.assembly file. You’ll get something such as:
>
> LBB0_12:                                ## =>This Inner Loop
> Header: Depth=1
> vmovdqu -96(%rax), %ymm0
> vmovdqu -64(%rax), %ymm1
> vmovdqu -32(%rax), %ymm2
> vmovdqu (%rax), %ymm3
> vpaddd -96(%r11), %ymm0, %ymm0
> vpaddd -64(%r11), %ymm1, %ymm1
> vpaddd -32(%r11), %ymm2, %ymm2
> vpaddd (%r11), %ymm3, %ymm3
> vmovdqu %ymm0, -96(%rbx)
> vmovdqu %ymm1, -64(%rbx)
> vmovdqu %ymm2, -32(%rbx)
> vmovdqu %ymm3, (%rbx)
> subq $-128, %r11
> subq $-128, %rax
> subq $-128, %rbx
> addq $-32, %r9
> jne LBB0_12
>
> That’s vectorized code, unrolled by 4. So you get 4 * (32 / 4) = 32
> elements processed every loop. The ymm registers shows that you are using
> 256 bits vector registers as available on avx cpu. With avx512, you would
> get zmm registers.
>
> François Fayard
>
> On Aug 17, 2017, at 8:44 PM, Craig Topper via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> I assume compiler knows that your only have 2 input values that you just
> added together 1000 times.
>
> Despite the fact that you stored to a[i] and b[i] here, nothing reads them
> other than the addition in the same loop iteration. So the compiler easily
> removed the a and b arrays. Same with 'c', it's not read outside the loop
> so it doesn't need to exist. So the compiler turned your loop body back
> into g+= aa + bb; And since the loop is 1000 iterations and aa and bb never
> change this got further simplified to (aa+bb)*1000.
>
> int main(int argc, char** argv) {
> int a[1000], b[1000], c[1000]; int g=0;
> int aa=atoi(argv[1]), bb=atoi(argv[2]);
> for (int i=0; i<1000; i++) {
> a[i]=aa, b[i]=bb;
>  c[i]=a[i] + b[i];
> g+=c[i];
> }
>
> ~Craig
>
> On Thu, Aug 17, 2017 at 11:37 AM, hameeza ahmed <hahmed2305 at gmail.com>
> wrote:
>
>> why is it happening? is there any way to solve this?
>>
>> On Thu, Aug 17, 2017 at 10:09 PM, hameeza ahmed <hahmed2305 at gmail.com>
>> wrote:
>>
>>> even if i make my code as follows: vectorized instructions not get
>>> emitted. What to do?
>>>
>>> int main(int argc, char** argv) {
>>> int a[1000], b[1000], c[1000]; int g=0;
>>> int aa=atoi(argv[1]), bb=atoi(argv[2]);
>>> for (int i=0; i<1000; i++) {
>>> a[i]=aa, b[i]=bb;
>>>  c[i]=a[i] + b[i];
>>> g+=c[i];
>>> }
>>>
>>> printf("sum: %d\n", g);
>>>
>>> return 0;
>>> }
>>>
>>> On Thu, Aug 17, 2017 at 10:03 PM, Craig Topper <craig.topper at gmail.com>
>>> wrote:
>>>
>>>> Did you remove the printf completely? Meaning that nothing accesses 'c'
>>>> after the loop? If so it got removed as dead code because it had no visible
>>>> effect.
>>>>
>>>> ~Craig
>>>>
>>>> On Thu, Aug 17, 2017 at 10:01 AM, hameeza ahmed <hahmed2305 at gmail.com>
>>>> wrote:
>>>>
>>>>> i removed printf from loop. Now getting no error. but the IR doesnot
>>>>> contain vectorized code. IR Output is as follows:
>>>>> ; ModuleID = 'sum-vec.ll'
>>>>> source_filename = "sum-vec.c"
>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>
>>>>> ; Function Attrs: norecurse nounwind readnone uwtable
>>>>> define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 {
>>>>>   ret i32 0
>>>>> }
>>>>>
>>>>> attributes #0 = { norecurse nounwind readnone uwtable
>>>>> "correctly-rounded-divide-sqrt-fp-math"="false"
>>>>> "disable-tail-calls"="false" "less-precise-fpmad"="false"
>>>>> "no-frame-pointer-elim"="false" "no-infs-fp-math"="false"
>>>>> "no-jump-tables"="false" "no-nans-fp-math"="false"
>>>>> "no-signed-zeros-fp-math"="false" "no-trapping-math"="false"
>>>>> "stack-protector-buffer-size"="8" "target-cpu"="knl"
>>>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,
>>>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx
>>>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r
>>>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt"
>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" }
>>>>>
>>>>> !llvm.ident = !{!0}
>>>>>
>>>>> !0 = !{!"clang version 4.0.0 (tags/RELEASE_400/final)"}
>>>>>
>>>>> what to do? please help.
>>>>>
>>>>>
>>>>> On Thu, Aug 17, 2017 at 9:57 PM, Nemanja Ivanovic <
>>>>> nemanja.i.ibm at gmail.com> wrote:
>>>>>
>>>>>> Move the printf out of the loop and it should vectorize just fine.
>>>>>>
>>>>>> On Thu, Aug 17, 2017 at 6:52 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I want to vectorize the user given inputs. when opt does
>>>>>>> vectorization user supplied inputs (from a text file) will be added using
>>>>>>> AVX vector instructions.
>>>>>>>
>>>>>>> as you pointed; When i changed my code to following:
>>>>>>>
>>>>>>> int main(int argc, char** argv) {
>>>>>>> int a[1000], b[1000], c[1000];
>>>>>>> int aa=atoi(argv[1]), bb=atoi(argv[2]);
>>>>>>> for (int i=0; i<1000; i++) {
>>>>>>> a[i]=aa, b[i]=bb;
>>>>>>>  c[i]=a[i] + b[i];
>>>>>>> printf("sum: %d\n", c[i]);
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> I am getting error remark: <unknown>:0:0: loop not vectorized: call
>>>>>>> instruction cannot be vectorized.
>>>>>>>
>>>>>>> I am running following commands:
>>>>>>> clang  -S -emit-llvm sum-vec.c -march=knl -O3 -mllvm
>>>>>>> -disable-llvm-optzns -o sum-vec.ll
>>>>>>> opt  -S -O3 -force-vector-width=64 sum-vec.ll -o sum-vec03.ll
>>>>>>>
>>>>>>> How to achieve this? Please help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 17, 2017 at 10:44 AM, Nemanja Ivanovic <
>>>>>>> nemanja.i.ibm at gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm not sure what you expect to have vectorized here. If you look
>>>>>>>> at the emitted code, there's no loop. It's just an add and a multiply as
>>>>>>>> you might expect when adding a loop-invariant sum 1000 times in a loop.
>>>>>>>>
>>>>>>>> On Wed, Aug 16, 2017 at 11:38 PM, hameeza ahmed via llvm-dev <
>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> I have written the following code. when i try to vectorize it
>>>>>>>>> through opt. i am not getting vectorized instructions.
>>>>>>>>>
>>>>>>>>> #include <stdio.h>
>>>>>>>>> #include<stdlib.h>
>>>>>>>>> int main(int argc, char** argv) {
>>>>>>>>> int sum=0; int a=atoi(argv[1]); int b=atoi(argv[2]);
>>>>>>>>> for (int i=0;i<1000;i++)
>>>>>>>>> {
>>>>>>>>> sum+=a+b;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> printf("sum: %d\n", sum);
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>> i use following commands:
>>>>>>>>> clang  -S -emit-llvm sum-main.c -march=knl -O3 -mllvm
>>>>>>>>> -disable-llvm-optzns -o sum-main.ll
>>>>>>>>> opt  -S -O3 -force-vector-width=64 sum-main.ll -o sum-main03.ll
>>>>>>>>>
>>>>>>>>> why is that so? where am i doing mistake? i am not getting
>>>>>>>>> vectorized operations rather getting scalar operations.
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/ddd9df89/attachment.html>