[llvm-dev] Non-Temporal hints from Loop Vectorizer

hameeza ahmed via llvm-dev llvm-dev at lists.llvm.org
Sat Jan 20 10:29:36 PST 2018


i have already seen usage of __builtin_nontemporal_store but i want to
automate identification of non temporal loads/stores. i think i need to go
for a pass. is it possiblee to detect non temporal loops without polly?

On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:

> On 20/01/2018 18:16, hameeza ahmed wrote:
>
> Actually i am working on vector accelerator which will perform those
> instructions which are non temporal.
>
> for instance if i have this loop
>
> for(i=0;i<2048;i++)
> a[i]=b[i]+c[i];
>
> currently it emits following IR;
>
>
>   %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64
> %index
>   %1 = bitcast i32* %0 to <16 x i32>*
>   %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1
>   %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64
> %index
>   %9 = bitcast i32* %8 to <16 x i32>*
>   %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1
>   %16 = add nsw <16 x i32> %wide.load14, %wide.load
>   %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64
> %index
>   %21 = bitcast i32* %20 to <16 x i32>*
>   store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1
>
>
> However, i want it to emit following IR
>
>   %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64
> %index
>   %1 = bitcast i32* %0 to <16 x i32>*
>   %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1,
> !nontemporal !1
>   %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64
> %index
>   %9 = bitcast i32* %8 to <16 x i32>*
>   %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa
> !1, !nontemporal !1
>   %16 = add nsw <16 x i32> %wide.load14, %wide.load, !nontemporal !1
>   %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64
> %index
>   %21 = bitcast i32* %20 to <16 x i32>*
>   store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1, !nontemporal
> !1
>
> so that i can offload load, add, store to accelerator hardware. is it
> possible here? do i need a separate pass to detect whether the loop has non
> temporal data or polly will help here? what do you say?
>
> From C/C++ you just need to use the __builtin_nontemporal_store/__builtin_nontemporal_load
> builtins to tag the stores/loads with the nontemporal flag.
>
> for(i=0;i<2048;i++) {
>   __builtin_nontemporal_store( __builtin_nontemporal_load(b+i) +
> __builtin_nontemporal_load(c + i), a + i );
> }
>
> There may be an attribute you can tag pointers with instead but I don't
> know off hand.
>
> On Sat, Jan 20, 2018 at 11:02 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
> wrote:
>
>> On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote:
>>
>>> Hello,
>>>
>>> My work deals with non-temporal loads and stores i found non-temporal
>>> meta data in llvm documentation but its not shown in IR.
>>>
>>> How to get non-temporal meta data?
>>>
>> llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector
>> loads in IR - is that what you're after?
>>
>> Simon.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/923638cd/attachment.html>


More information about the llvm-dev mailing list