[llvm-dev] AVX Scheduling and Parallelism
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 23 19:16:34 PDT 2017
It is possible that the issue with scheduling is constrained due to
pointer-aliasing assumptions. Could you share the source for the loop in
RIP-relative indexing, as I recall, is a feature of position-independent
code. Based on what's below, it might cause problems by making the
instruction encodings large. cc'ing some Intel folks for further comments.
On 06/23/2017 09:02 PM, hameeza ahmed via llvm-dev wrote:
> After generating AVX code for large no of iterations i came to realize
> that it still uses only 2 registers zmm0 and zmm1 when the loop
> urnroll factor=1024,
> i wonder if this register allocation allows operations in parallel?
> Also i know all the elements within a single vector instruction are
> computed in parallel but does the elements of multiple instructions
> computed in parallel? like are 2 vmov with different registers
> executed in parallel? it can be because each core has an AVX unit.
> does compiler exploit it?
> secondly i am generating assembly for intel and there are some offset
> like rip register or some constant addition in memory index. why is
> that so?
> vmovdqu32zmm0, zmmword ptr [rip + c]
> vpadddzmm0, zmm0, zmmword ptr [rip + b]
> vmovdqu32zmmword ptr [rip + a], zmm0
> vmovdqu32zmm0, zmmword ptr [rip + c+64]
> vpadddzmm0, zmm0, zmmword ptr [rip + b+64]
> eg. 2
> movrax, -393216
> .p2align4, 0x90
> .LBB0_1: # %vector.body
> # =>This Inner Loop Header:
> vmovdqu32zmm1, zmmword ptr [rax + c+401344] ; load
> c in zmm1
> vmovdqu32zmm0, zmmword ptr [rax + c+401280] ;load
> b in zmm0
> vpadddzmm1, zmm1, zmmword ptr [rax + b+401344] ;
> vmovdqu32zmmword ptr [rax + a+401344], zmm1 ; store zmm1
> in c
> vmovdqu32zmm1, zmmword ptr [rax + c+401216]
> vpadddzmm0, zmm0, zmmword ptr [rax + b+401280] ;
> vmovdqu32zmmword ptr [rax + a+401280], zmm0 ; store zmm0
> in c
> vmovdqu32zmm0, zmmword ptr [rax + c+401152]
> ........ in the remaining instructions also there is only zmm0 and
> zmm1 used?
> As you can see in the above examples there could be multiple registers
> use. also i doubt if the above set of repeating instructions in eg. 2
> are executed in parallel? and why repeat zmm0 and zmm1 cant it be more
> zmms and all in parallel, mean the one w/o dependency. for eg in above
> example blue has dependency in between and red has dependency among
> each other they cant be executed in parallel but blue and red can be
> executed in parallel?
> Please correct me if I am wrong.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev