[llvm-commits] [LLVMdev] [PATCH] BasicBlock Autovectorization Pass

Hal Finkel hfinkel at anl.gov
Tue Jan 24 13:01:47 PST 2012


I have attached the latest version of my basic-block autovectorization
pass.

With regard to the non-trivial cycle checking I had mentioned
previously, I implemented the "late abort" solution and made it the
default for cases where the full cycle check would be expensive (for
blocks that have many candidate pairs). For blocks with fewer candidate
pairs, the full cycle check is used.

I believe that I have addressed all concerns raised thus far (except for
the container Value* -> Instruction* type changes, which Tobias said he
would be okay with having changed post commit). If I receive no
objections over the next few days, I'll commit.

I would like to thank everyone who has provided feedback, many of the
suggestions have proved quite valuable.

Thanks again,
Hal

On Tue, 2012-01-24 at 10:17 -0600, Hal Finkel wrote:
> On Tue, 2012-01-24 at 16:53 +0100, Tobias Grosser wrote:
> > On 01/24/2012 05:13 AM, Hal Finkel wrote:
> > > On Tue, 2012-01-17 at 13:25 -0600, Sebastian Pop wrote:
> > >> Hi,
> > >>
> > >> On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser<tobias at grosser.es>  wrote:
> > >>> As it seems my intuition is wrong, I am very eager to see and understand
> > >>> an example where a search limit of 4000 is really needed.
> > >>>
> > >>
> > >> To make the ball roll again, I attached a testcase that can be tuned
> > >> to understand the impact on compile time for different sizes of a
> > >> basic block.  One can also set the number of iterations in the loop to
> > >> 1 to test the vectorizer with no loops around.
> > >>
> > >> Hal, could you please report the compile times with/without the
> > >> vectorizer for different basic block sizes?
> > >
> > > I've looked at your test case, and I am pleased to report a negligible
> > > compile-time increase!
> > That is nice. But does this example actually trigger the search limit of 
> > 4000? I think that is the case I am especially interested in.
> 
> I know (and the answer is yes, it could, but not in an interesting way),
> but I reduced the default search limit to 400. I did this because, when
> used in combination with my load/store-reordering patch, such a high
> limit is no longer optimal. As I suspected, it appears that the high
> limit was compensating for the lack of the ability to schedule
> non-aliasing loads after stores. I would like to deal with the
> load/store reording problem on its own merits (and have already
> submitted a patch that does this), and so I'll leave the lower default
> on the vectorizer search limit.
> 
> In addition, Sebastian's test case highlights why, with the current
> implementation, having such a high search limit would be bad for compile
> times. A limit in the hundreds, not thousands, is necessary to provide
> reasonable compile times for unrolled loops with long dependency chains
> such as the ones in his example.
> 
> Thanks again,
> Hal
> 
> > 
> > Cheers
> > Tobi
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_bb_vectorize-20120124-2.diff
Type: text/x-patch
Size: 128711 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120124/bd809a52/attachment.bin>


More information about the llvm-commits mailing list