[llvm-dev] Vectorizing remainder loop

hameeza ahmed via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 3 14:36:03 PDT 2018


Thank You so much.

I plan to begin with the 2nd option i.e tweaking Ashutosh’s patch.
How should i begin working on this?
Please guide

On Sat, Aug 4, 2018 at 12:30 AM, Saito, Hideki <hideki.saito at intel.com>
wrote:

>
>
> >it cannot afford large size masks for large vectors
>
>
>
> So, even a standard way of vectorizing remainder in masked or unmasked
> fashion wouldn’t work, I suppose. Ouch.
>
>
>
> I suppose VPlan should be able to model this kind of gigantic remainder
> vector code (when the time comes). Not pretty at all, though.
>
>
>
> Now, be fully aware that Direction #2 is really a poor (or rather
> extremely poor) person’s remainder loop vectorization approach. Nowhere
> close to do it right. If you really have to go down that path…… the
> following would be the basics steps.
>
>
>
> 1)      First, look at the comments in createVectorizedLoopSkeleton().
> “old scalar loop to handle remainder” is the one you want to vectorize
> again.
> Near the end of that function, LoopScalarBody is set. This is the loop
> body of interest. OrigLoop becomes the scalar Loop. First, try attaching
> a new metadata (with VF info) to it.
>
> 2)      Create a RemainderVectorizePass, very similar to
> LoopVectorizePass, but let it work only on the loops that has the metadata.
> Add it to IPO/PassManagerBuilder
> right after LoopVectorize. You could also try adding a state in
> LoopVectorize instead of creating a RemainderVectorizePass. You can then
> read the metadata
> and try to vectorize the loop with half the VF. This will result in
> another remainder loop. Attach (or modify) the metadata with the halved VF.
>
> 3)      You may have to do some cleanup of primary IV between
> LoopVectorize and RemainderVectorize. If LoopVectorizationLegality chokes
> in RemainderVectorize
> (i.e., thinks the loop is not legal to vectorize), try resolving what it
> is complaining. Legality should be the same between main vector loop and
> remainder ------ in theory.
>
> 4)      Once you get to that point, just run RemainderVectorize enough
> times, until VF is small enough.
>
>
>
> This shouldn’t require much knowledge on LoopVectorize if things go nice
> and smooth.
>
>
>
> If you opt to go with tweaking Ashutosh’s patch,
>
>
>
> Ashutosh’s patch (https://reviews.llvm.org/D30247 ) is doing what I wrote
> above within LoopVectorize (thus it can skip many things since data
> structures are still alive), through createVectorEpilog(), instantiating
> InnerLoopVectorizer inside InnerLoopVectorizer. That takes care of one
> remainder loop vectorization step. You need to modify so that it can do
> more than one step (like using an vector of EpilogLoopInfo). If you
> struggle in convincing LoopVectorizationLegality to think remainder is just
> as legal to vectorize as main vector loop, you should be able to avoid that
> if you take this approach.
>
>
>
> You still have two approaches to unblock yourself in the short term.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> *From:* hameeza ahmed [mailto:hahmed2305 at gmail.com]
> *Sent:* Friday, August 03, 2018 10:58 AM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* Craig Topper <craig.topper at gmail.com>; Hal Finkel <hfinkel at anl.gov>;
> Friedman, Eli <efriedma at codeaurora.org>; ashutosh.nema at amd.com;
> llvm-dev at lists.llvm.org
> *Subject:* Re: Vectorizing remainder loop
>
>
>
> Thank You so much...
>
> The hardware is designed already and it cannot afford large size masks for
> large vectors. So I m opting for direction 2. Also I did try the patch but
> i was getting some errors.
>
>
>
> Can you please guide me how to proceed with direction 2?
>
>
>
> Thank You
>
> Regards
>
>
>
> On Fri, Aug 3, 2018 at 3:28 AM, Saito, Hideki <hideki.saito at intel.com>
> wrote:
>
>
> Hi Hameeza,
>
> Aside from Ashutosh's patch.....
>
> When the vector width is that large, we can't keep vectorizing remainder
> like below. It'll be a huge code size if nothing else ---- hitting ITLB
> miss because of this is very bad, for example.
>         VF=2048 // main vector loop
>         VF=1024 // vectorized remainder 1
>         VF=512   // vectorized remainder 2
>         ...
>         Vectorize remainder until trip count is small enough for scalar
> execution.
>
> Direction #1
> Does your HW support efficient masking? If so, the first thing to try is
> VF=2048 with masking so that you won't have any remainder loop. In other
> words, bump up the trip count to the multiple of 2048 and then have an IF
> branch inside the loop body so that beyond the original trip count is a
> no-op. Then vectorize that loop.
>
>         For (i=0;i<N;i++){
>                 body
>         }
> ==>
>         For (i=0;i<M;i++){ // where M is a multiple of 2048
>                 If (I < N) {
>                         Body
>                 }
>         }
>
> If your HW can't execute vector version of the above loop efficiently
> enough, it's already busted. Typically, when VF is that large, what you'll
> get in the remainder is masked vector like below, and vec_remainder_body is
> reasonably hot as you say in your original mail. As such, remainder loop
> vectorization isn't a solution for that problem.
>
>         for (i=0;i<N;i+=2048){
>                 Vec_body
>         }
>         for (i<M;i+=1024){ // where M is the smallest multiple of 1024
> over N
>                 If (I < N) {
>                         Vec_Remainder_Body
>                 }
>         }
>
> If your HW designers insist that the compiler to generate
>         VF=2048 // main vector loop
>         VF=1024 // vectorized remainder 1
>         VF=512  // vectorized remainder 2
>         ...
>         Remainder is small enough for scalar.
> I suggest you go back and tell them to reconsider the HW design such that
> the Direction #1 works well enough on the HW.
>
> Direction #2
> In the meantime, if you are really stuck in the situation (i.e,, HW is
> already built and you don't have much time), the simplest thing for you to
> do is to run the LV second (third/fourth/...) time, after marking the
> remainder loop with the metadata so that you know which loops you want to
> deal with in the second round. It's very much of a hack but it'll be a
> small change you need to make and that way you are not much impacted by
> other changes VPlan project is making. If you have a major change outside
> of the trunk, you may be hit hard.
>
> Direction #3
> If you are given time to do the right implementation of remainder loop
> vectorization, please join the VPlan bandwagon and work on it there. Major
> development like this should happen on VPlans. Please let us know if you
> can do that. Ashutosh, how about you?
>
> Hopefully, one or more of the four alternative directions to consider,
> including Ashutosh's patch, would work for you.
>
> Thanks,
> Hideki
> -------------------
> Date: Mon, 30 Jul 2018 05:16:15 +0000
> From: "Nema, Ashutosh via llvm-dev" <llvm-dev at lists.llvm.org>
> To: hameeza ahmed <hahmed2305 at gmail.com>, Craig Topper
>         <craig.topper at gmail.com>, Hal Finkel <hfinkel at anl.gov>, "Friedman,
>         Eli" <efriedma at codeaurora.org>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Vectorizing remainder loop
>
> Hi Hameeza,
>
> At this point Loop Vectorizer does not have capability to vectorize
> epilog/remainder loop.
> Sometime back there is an RFC on epilog loop vectorization but it did not
> went through because of concerns.
> This RFC has a patch as well, maybe you can give a try with it.
> http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-
> Epilog-loop-vectorization-tt106322.html#none
>
> - Ashutosh
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of hameeza
> ahmed via llvm-dev
> Sent: Sunday, July 29, 2018 10:24 PM
> To: llvm-dev <llvm-dev at lists.llvm.org>; Craig Topper <
> craig.topper at gmail.com>; Hal Finkel <hfinkel at anl.gov>; Friedman, Eli <
> efriedma at codeaurora.org>
> Subject: Re: [llvm-dev] Vectorizing remainder loop
>
> Please help in solving this issue. the issue of scalar remainder loop is
> really big and significant with large vector widths.
>
> Please help
>
> Thank You
>
> On Sun, Jul 29, 2018 at 2:52 PM, hameeza ahmed <hahmed2305 at gmail.com
> <mailto:hahmed2305 at gmail.com>> wrote:
> Hello, I m working on a hardware with very large vector width till v2048.
> Now when I vectorize using llvm default vectorizer maximum 2047 iterations
> are scalar remainder loop. These are not vectorized by llvm which increases
> the cost. However these should be vectorized using next available vector
> width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4.....
>
> The issue of scalar remainder loop has been there in llvm but this issue
> is enhanced and can't be ignored with large vector width. This is very
> important and significant to solve this issue.
>
> Please help. I m trying to see loopvectorizer.cpp but unable to figure out
> actual code to make changes.
>
> It's very important for me to solve this issue.
>
> Please help.
>
> Thank you
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180804/e2f121ca/attachment.html>


More information about the llvm-dev mailing list