[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

Vladimir Guzma vladimir.guzma at tut.fi
Tue Feb 19 19:44:02 PST 2013


Hi all,

>> Hal, this OpenCL WG autovectorization work, unfortunately, is not my
>> first
>> priority task at work currently (more like a pet project), so I
>> cannot make any
>> promises on when I might find time to work on it again. So, if you
>> want to
>> see the parallel loop iteration MD happen sooner, I'd recommend you
>> dig into
>> it. I think we'd like to start from the scratch for the bbvectorizer
>> utilization
>> in pocl anyways, that is, would add the metadata support first and
>> then use it
>> in a fresh bbvectorizer version. The current hacked version in pocl
>> seems not
>> to be upstreamable easily as it has lagged behind some LLVM versions
>> and is
>> rather dirty.
> 
> Understood. If you have some time, it seems that there are several sub-tasks:
> 
> - Update the language reference
> - Update the loop vectorizer (to update the metadata when it unrolls)
> - Update the regular unroller
> - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this?

Current implementation of AA uses work item metadata as well as 'region' metadata identifiers (regions begin separated by barriers).
So in order to provide similar functionality, parallel loop metadata would need both, 'loop ID' and 'loop iteration ID'.

This is perhaps not much of a immediate concern with respect to current discussion, but can become trouble in case there are two consecutive loops, both marked with parallel loop metadata, both fully unrolled (or partially unrolled followed by some loop fusion/combining pass). In this case there is need for 'loop ID' to distinguish origin, since 'loop iteration ID' is not enough.

> - Update the BB vectorizer to prefer pairings from different iterations

Updating BB vectorizer to use work item metadata was rather trivial addition of a test for difference in identifiers, very similar to one in AA (though we also record position of the instruction in the originating block) and should be trivial to add to BBvectorizer as well, using parallel metadata. 
It become major mess once complains started about speed, e.g. pocl version does not have any search limits or maximum instr per group etc, so finding all candidate pairs become rather time consuming.
So the candidate selection is not compatible with BB vectorizer, and whole bunch of code was removed…

Anyways, perhaps interesting parts for integrating to BBVectorizer could be (crude) caching during replaceOutputs to be used when  vectorizing phi nodes. There is also some vectorization of getelementpointer instructions, creation vectors of allocas to get better vector memory accesses, some magic about computing addresses of stride memory accesses using vectors, some tweaks to eliminate unneeded shuffle instructions in replacement inputs etc. There are lot of assumptions that the instructions to be vectorized are really identical from different work items (due to recorded position in the originating code), which may not be case in general BB vectorized cases.

Anyways, if the loop metadata gets updated, I can have a look at updating AA and moving it from pocl to LLVM, but not likely this week (maybe Pekka can provide it sooner if there is a rush). 
I can not make any promises about BBVectorization atm, unfortunately.

regards
Vlado
> 
> Thanks again,
> Hal
> 
>> 
>> BR,
>> --
>> --Pekka
>> 
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





More information about the llvm-dev mailing list