[llvm-dev] Polly and optimisations
Alexandre Isoard via llvm-dev
llvm-dev at lists.llvm.org
Sat Oct 21 09:21:25 PDT 2017
Hi James,
Polly has some transformation on the data too (mainly used when targeting
GPU back-ends). The thing is that Polly, being based on the polyhedral
model, can only work on "regular" transformations. That is, sparse data is
hard to handle (position of the data depends on run-time values), but dense
data is usually fine.
As for you question on data locality, yes, Polly is actually focused on
data locality (temporal locality for now, but spatial locality is in
project) and parallelism. But again, because Polly works best on dense
algorithm, I am not sure how it would perform on the examples you suggest.
In any case, those discussions are welcome !
Note that most cache oblivious approaches usually only change the iteration
orders without changing the data too. But you are right in assuming they
are tightly correlated.
Best Regard.
On Sat, Oct 21, 2017 at 3:08 AM, James Courtier-Dutton via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> My understanding of Polly is that it rearranges the executable
> instructions in order to perform the task quicker.
> Are there any tools out there that rearrange the data so that it is
> processed quicker?
>
> For example, say you start with a sparse(but still all in RAM) data
> set, and you wish to do computations on it.
> I have found that this can be done much quicker if you collect all the
> data to be processed into batches, make the batches of data
> contiguous, so that each batch fits in the CPU cache, and then process
> that batch while only having to access the CPU cache.
> Then afterwards, copy out the results back to the sparse layout.
> I know this general approach is called "Cache Oblivious Algorithms",
> but I was wondering if any compiler optimizations could do this for
> you.
>
> For example, if an algorithm had 10 processing steps, and for each
> step it scanned the entire data set. An optimization could be to do
> all 10 processing steps on the first data item, and then move to the
> next item etc. This would process the data much faster due to the
> better use of the cache.
>
> Obviously, this cannot be done in all cases, but does something like
> polly take into account data locality like the examples above? Enough
> so, that is would even add extra malloc and memcopys where needed.
>
> Kind Regards
>
> James
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171021/175a42f0/attachment.html>
More information about the llvm-dev
mailing list