[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

Wed Feb 20 07:41:39 PST 2013

>>>> Anyways, perhaps interesting parts for integrating to BBVectorizer
>>>> could be (crude) caching during replaceOutputs to be used when
>>>> vectorizing phi nodes. There is also some vectorization of
>>>> getelementpointer instructions, creation vectors of allocas to get
>>>> better vector memory accesses, some magic about computing
>>>> addresses
>>>> of stride memory accesses using vectors, some tweaks to eliminate
>>>> unneeded shuffle instructions in replacement inputs etc. There are
>>>> lot of assumptions that the instructions to be vectorized are
>>>> really
>>>> identical from different work items (due to recorded position in
>>>> the
>>>> originating code), which may not be case in general BB vectorized
>>>> cases.
>>> 
>>> To clarify, are these features that you've implemented in your
>>> version?
>> 
>> Yes, these are there. As well as some stuff to clean up after
>> vectorizer...
>> From performance point of view, addition of vectors of phi nodes was
>> most beneficial for our main target (TTA architecture).

Hello Hal,
> 
> It looks like the source is here:
> http://bazaar.launchpad.net/~pocl/pocl/trunk/view/head:/lib/llvmopencl/WIVectorize.cc

Yes, that is the source.
> 
> Are there test cases for these new features?
No, no specific test cases. pocl used to have vectorization enabled by default also for the pthread target during initial testing.

> 
> I'll look at this version; hopefully we'll be able to get most if not all of the improvements upstream. It looks like you've left your version under LLVM's license; is that correct? If not, may I have your explicit permission for relicensing?

Yes, getting improvements upstream would be great. Unfortunately, I never had time to separate them, try with clean BBVectorizer and see if they are general enough, since they were meant to be OpenCL specific and were mostly oriented towards TTA architecture target.
Some may be too specific, e.g. vectorization of allocas to take advantage of vector access to OpenCL private memories for different work items.
Yes, the license is LLVM, original modification was just addition of method areInstsCompatibleFromDifferentWI and calling it before the other compatibility test. 
.
> 
> I've made some significant performance improvements in BBVectorize recently, and you may want to adopt those changes in your version as well (especially if people have complained about speed).

I tried to follow the development and 'import' improvements through the last year. Though the biggest speed problem was the analysis of memory accesses and traversing whole BB to test all instruction pairs. With metadata defining originating instructions this is now simple pass that groups candidates from same originating lanes together before finding out if they can be vectorized.
The memory accesses are only analysed to find out if they can be turned into the vector load/store. We basically do something equivalent to full loop unroll (in our case it can be creating one copy of kernel for every work item in the OpenCL workspace as well), let llvm passes do what they can - remove unneeded memory access computations for example, and then "fold" again using metadata and create vectors where it is possible.

regards
Vlado
> 
> Thanks again,
> Hal
> 
>> 
>> regards
>> Vlado
>>> 
>>>> 
>>>> Anyways, if the loop metadata gets updated, I can have a look at
>>>> updating AA and moving it from pocl to LLVM, but not likely this
>>>> week (maybe Pekka can provide it sooner if there is a rush).
>>>> I can not make any promises about BBVectorization atm,
>>>> unfortunately.
>>> 
>>> Great, thanks!
>>> 
>>> -Hal
>>> 
>>>> 
>>>> regards
>>>> Vlado
>>>>> 
>>>>> Thanks again,
>>>>> Hal
>>>>> 
>>>>>> 
>>>>>> BR,
>>>>>> --
>>>>>> --Pekka
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>>>> 
>> 
>>