PATCH: Make TSVC benchmarks static data layout more predictable (PR14076)

Sat Apr 6 14:17:49 PDT 2013

On Sat, Apr 6, 2013 at 2:01 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Daniel Dunbar" <daniel at zuster.org>
> > To: "Hal Finkel" <hfinkel at anl.gov>, "Jakob Stoklund Olesen" <
> stoklund at 2pi.dk>, "Commit Messages and Patches for LLVM"
> > <llvm-commits at cs.uiuc.edu>
> > Sent: Saturday, April 6, 2013 2:40:41 PM
> > Subject: PATCH: Make TSVC benchmarks static data layout more predictable
> (PR14076)
> >
> >
> > Hi Hal,
> >
> >
> > As currently written, the performance of the TSVC benchmarks can
> > depend very heavily on the exact address assignment of the global
> > data arrays. Since that is something we do not (and are not likely
> > anytime soon) model in the compiler, this makes them suboptimal as
> > they are not testing what they purport to be.
> >
> >
> > The attached patch fixes this problem by making the static data
> > layout more predictable by moving all of the data arrays into a
> > single structure, so that the relative addresses are stable across
> > all platforms.
> >
> >
> > In particular, this resolves:
> > http://llvm.org/bugs/show_bug.cgi?id=14076
> >
> > because, at least on OS X but presumably other architectures, two of
> > the arrays in some of the benchmarks are very likely to end up at
> > exact 4K offsets from each other. This causes severe performance
> > degradation on some Intel platforms, in the worst case on
> > StatementReordering-flt this can double the runtime of the
> > benchmark. See:
> >
> http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/ug_docs/GUID-C801145A-A066-4C1A-B744-2B51AD89EFF6.htm
> > for more information.
> >
> >
> > Even worse, on some Intel architectures (Sandy Bridge, at least),
> > when this problem is hit the runtime of the benchmark is no longer
> > predictable and can vary by up to 100% run-to-run (!!!).
> >
> >
> > I wrote the patch in such a way that I don't think it should impair
> > the compilers ability to perform any vectorization optimizations,
> > but wanted to run it past you before committing it.
> >
> >
> > What do you think?
>
> First, thanks for working on this! I think using your patch will
> necessitate runtime overlap checks on any vectorization (because there is
> no way to otherwise determine without IPO than the pointers point to
> disjoint memory regions). This will also cut out a lot of BB-vectorization
> opportunities. Clang does not respect restrict on non-function parameters,
> right? In that case, we might need to pass the arrays to each function
> through restrict parameters.
>

Hmm, you are probably right. The patch was silly though, there is no need
to runtime initialize the global array pointers. Attached a revised version
which just statically initializes them to point into the global structure;
AA should be able to look through them now and see they don't alias. I
verified that AA did this on a trivial function, and I generated the IR
with -O3 -fvectorize before and after and spot checked that structurally
the same things are going on. Look ok?

 - Daniel

>
>  -Hal
>
> >
> >
> > I attached a snapshot from a before and after run with the patch
> > applied, showing the graphs from some of the benchmarks in the TSVC
> > suite that trigger this problem. Each run was done with 5 samples
> > each, and as you can see in the old version, the runtime of the
> > benchmarks is highly variable.
> >
> >
> > Thanks,
> > - Daniel
> > =
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130406/854b5167/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-tests-Modify-TSVC-static-data-layout-to-make-benchma.patch
Type: application/octet-stream
Size: 3730 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130406/854b5167/attachment.obj>