[llvm-dev] Vectorizing remainder loop

hameeza ahmed via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 3 10:57:39 PDT 2018


Thank You so much...
The hardware is designed already and it cannot afford large size masks for
large vectors. So I m opting for direction 2. Also I did try the patch but
i was getting some errors.

Can you please guide me how to proceed with direction 2?

Thank You
Regards

On Fri, Aug 3, 2018 at 3:28 AM, Saito, Hideki <hideki.saito at intel.com>
wrote:

>
> Hi Hameeza,
>
> Aside from Ashutosh's patch.....
>
> When the vector width is that large, we can't keep vectorizing remainder
> like below. It'll be a huge code size if nothing else ---- hitting ITLB
> miss because of this is very bad, for example.
>         VF=2048 // main vector loop
>         VF=1024 // vectorized remainder 1
>         VF=512   // vectorized remainder 2
>         ...
>         Vectorize remainder until trip count is small enough for scalar
> execution.
>
> Direction #1
> Does your HW support efficient masking? If so, the first thing to try is
> VF=2048 with masking so that you won't have any remainder loop. In other
> words, bump up the trip count to the multiple of 2048 and then have an IF
> branch inside the loop body so that beyond the original trip count is a
> no-op. Then vectorize that loop.
>
>         For (i=0;i<N;i++){
>                 body
>         }
> ==>
>         For (i=0;i<M;i++){ // where M is a multiple of 2048
>                 If (I < N) {
>                         Body
>                 }
>         }
>
> If your HW can't execute vector version of the above loop efficiently
> enough, it's already busted. Typically, when VF is that large, what you'll
> get in the remainder is masked vector like below, and vec_remainder_body is
> reasonably hot as you say in your original mail. As such, remainder loop
> vectorization isn't a solution for that problem.
>
>         for (i=0;i<N;i+=2048){
>                 Vec_body
>         }
>         for (i<M;i+=1024){ // where M is the smallest multiple of 1024
> over N
>                 If (I < N) {
>                         Vec_Remainder_Body
>                 }
>         }
>
> If your HW designers insist that the compiler to generate
>         VF=2048 // main vector loop
>         VF=1024 // vectorized remainder 1
>         VF=512  // vectorized remainder 2
>         ...
>         Remainder is small enough for scalar.
> I suggest you go back and tell them to reconsider the HW design such that
> the Direction #1 works well enough on the HW.
>
> Direction #2
> In the meantime, if you are really stuck in the situation (i.e,, HW is
> already built and you don't have much time), the simplest thing for you to
> do is to run the LV second (third/fourth/...) time, after marking the
> remainder loop with the metadata so that you know which loops you want to
> deal with in the second round. It's very much of a hack but it'll be a
> small change you need to make and that way you are not much impacted by
> other changes VPlan project is making. If you have a major change outside
> of the trunk, you may be hit hard.
>
> Direction #3
> If you are given time to do the right implementation of remainder loop
> vectorization, please join the VPlan bandwagon and work on it there. Major
> development like this should happen on VPlans. Please let us know if you
> can do that. Ashutosh, how about you?
>
> Hopefully, one or more of the four alternative directions to consider,
> including Ashutosh's patch, would work for you.
>
> Thanks,
> Hideki
> -------------------
> Date: Mon, 30 Jul 2018 05:16:15 +0000
> From: "Nema, Ashutosh via llvm-dev" <llvm-dev at lists.llvm.org>
> To: hameeza ahmed <hahmed2305 at gmail.com>, Craig Topper
>         <craig.topper at gmail.com>, Hal Finkel <hfinkel at anl.gov>, "Friedman,
>         Eli" <efriedma at codeaurora.org>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Vectorizing remainder loop
>
> Hi Hameeza,
>
> At this point Loop Vectorizer does not have capability to vectorize
> epilog/remainder loop.
> Sometime back there is an RFC on epilog loop vectorization but it did not
> went through because of concerns.
> This RFC has a patch as well, maybe you can give a try with it.
> http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-
> Epilog-loop-vectorization-tt106322.html#none
>
> - Ashutosh
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of hameeza
> ahmed via llvm-dev
> Sent: Sunday, July 29, 2018 10:24 PM
> To: llvm-dev <llvm-dev at lists.llvm.org>; Craig Topper <
> craig.topper at gmail.com>; Hal Finkel <hfinkel at anl.gov>; Friedman, Eli <
> efriedma at codeaurora.org>
> Subject: Re: [llvm-dev] Vectorizing remainder loop
>
> Please help in solving this issue. the issue of scalar remainder loop is
> really big and significant with large vector widths.
>
> Please help
>
> Thank You
>
> On Sun, Jul 29, 2018 at 2:52 PM, hameeza ahmed <hahmed2305 at gmail.com
> <mailto:hahmed2305 at gmail.com>> wrote:
> Hello, I m working on a hardware with very large vector width till v2048.
> Now when I vectorize using llvm default vectorizer maximum 2047 iterations
> are scalar remainder loop. These are not vectorized by llvm which increases
> the cost. However these should be vectorized using next available vector
> width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4.....
>
> The issue of scalar remainder loop has been there in llvm but this issue
> is enhanced and can't be ignored with large vector width. This is very
> important and significant to solve this issue.
>
> Please help. I m trying to see loopvectorizer.cpp but unable to figure out
> actual code to make changes.
>
> It's very important for me to solve this issue.
>
> Please help.
>
> Thank you
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180803/83fd10de/attachment.html>


More information about the llvm-dev mailing list