[llvm-dev] Non-Temporal hints from Loop Vectorizer
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Sun Jan 21 12:59:57 PST 2018
On 01/20/2018 12:29 PM, hameeza ahmed via llvm-dev wrote:
> i have already seen usage of __builtin_nontemporal_store but i want to
> automate identification of non temporal loads/stores. i think i need
> to go for a pass. is it possiblee to detect non temporal loops without
> polly?
Yes, but we don't have anything that does that right now. The cost
modeling is non-trivial, however. In the loop below, which of those
accesses would you expect to be nontemporal? All of those accesses span
only 8 KB, and that's certainly smaller than many L1 caches. Turning
those into nontemporal accesses could certainly lead to a performance
regression for that loop, subsequent code, or both. If we do this more
generally, I suspect that we'd need to split the loop so that small trip
counts don't use them at all, and for larger trip counts, we don't
disturb data-reuse opportunities that would otherwise exist.
-Hal
>
> On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim
> <llvm-dev at redking.me.uk <mailto:llvm-dev at redking.me.uk>> wrote:
>
> On 20/01/2018 18:16, hameeza ahmed wrote:
>> Actually i am working on vector accelerator which will perform
>> those instructions which are non temporal.
>>
>> for instance if i have this loop
>>
>> for(i=0;i<2048;i++)
>> a[i]=b[i]+c[i];
>>
>> currently it emits following IR;
>>
>>
>> %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64
>> 0, i64 %index
>> %1 = bitcast i32* %0 to <16 x i32>*
>> %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1
>> %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64
>> 0, i64 %index
>> %9 = bitcast i32* %8 to <16 x i32>*
>> %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1
>> %16 = add nsw <16 x i32> %wide.load14, %wide.load
>> %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a,
>> i64 0, i64 %index
>> %21 = bitcast i32* %20 to <16 x i32>*
>> store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1
>>
>>
>> However, i want it to emit following IR
>>
>> %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64
>> 0, i64 %index
>> %1 = bitcast i32* %0 to <16 x i32>*
>> %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa
>> !1, !nontemporal !1
>> %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64
>> 0, i64 %index
>> %9 = bitcast i32* %8 to <16 x i32>*
>> %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa
>> !1, !nontemporal !1
>> %16 = add nsw <16 x i32> %wide.load14, %wide.load, !nontemporal !1
>> %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a,
>> i64 0, i64 %index
>> %21 = bitcast i32* %20 to <16 x i32>*
>> store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa
>> !1, !nontemporal !1
>>
>> so that i can offload load, add, store to accelerator hardware.
>> is it possible here? do i need a separate pass to detect whether
>> the loop has non temporal data or polly will help here? what do
>> you say?
> From C/C++ you just need to use the
> __builtin_nontemporal_store/__builtin_nontemporal_load builtins to
> tag the stores/loads with the nontemporal flag.
>
> for(i=0;i<2048;i++) {
> __builtin_nontemporal_store( __builtin_nontemporal_load(b+i) +
> __builtin_nontemporal_load(c + i), a + i );
> }
>
> There may be an attribute you can tag pointers with instead but I
> don't know off hand.
>
>> On Sat, Jan 20, 2018 at 11:02 PM, Simon Pilgrim
>> <llvm-dev at redking.me.uk <mailto:llvm-dev at redking.me.uk>> wrote:
>>
>> On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote:
>>
>> Hello,
>>
>> My work deals with non-temporal loads and stores i found
>> non-temporal meta data in llvm documentation but its not
>> shown in IR.
>>
>> How to get non-temporal meta data?
>>
>> llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to
>> create nt vector loads in IR - is that what you're after?
>>
>> Simon.
>>
>>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180121/1fa0789c/attachment.html>
More information about the llvm-dev
mailing list