[llvm-dev] RFC phantom memory intrinsic
Haidl, Michael via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 13 05:23:05 PDT 2017
Hi Dinar,
I am asking because I am maintaining an out-of-tree pass which does
exactly what SLP does not. It is a pass designed for GPUs to combine
loads and stores, e.g., when consecutive fields of a structure have the
same type it merges the loads and stores to vector loads and stores. I
have a case where InstCombine removes a store and your approach would be
valuable for me if the entire access to an aggregate could be restored.
Second thing I am concerned is that this intrinsics "just" fix a
specific problem of IC where potentially a more generic solution is needed.
Cheers,
Michael
> Hi Michael,
>> Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases.
> Yes, correct, this a little bit changed example is not working.
> #include <x86intrin.h>
>
> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
> }
> But with the aggregate case it is a new level of complexity, should we
> we care about? There might be some logic that probably would be mark
> as dead by InstCombine and we don't want to keep it.
> BTW: Looks like SLP could not recognize the case either :
> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 {
> entry:
> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
> %0 = load double, double* %arrayidx, align 8
> %vecinit = insertelement <4 x double> undef, double %0, i32 0
> %add = add i64 %i, 1
> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
> %1 = load double, double* %arrayidx1, align 8
> %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1
> %add3 = add i64 %i, 2
> %arrayidx4 = getelementptr inbounds double, double* %ptr, i64 %add3
> %2 = load double, double* %arrayidx4, align 8
> %vecinit5 = insertelement <4 x double> %vecinit2, double %2, i32 2
> %add6 = add i64 %i, 3
> %arrayidx7 = getelementptr inbounds double, double* %ptr, i64 %add6
> %3 = load double, double* %arrayidx7, align 8
> %vecinit8 = insertelement <4 x double> %vecinit5, double %3, i32 3
> %shuffle = shufflevector <4 x double> %vecinit8, <4 x double>
> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32 2>
> ret <4 x double> %shuffle
> }
>
> Thanks, Dinar.
>
> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael
> <michael.haidl at uni-muenster.de> wrote:
>> Interesting approach but how do you handle more complex offsets, e.g., when
>> the pointer is part of an aggregate? Only one offset does not seem enough to
>> handle generic cases.
>>
>> -----Original Message-----
>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at lists.llvm.org]
>> Received: Dienstag, 12 Sep. 2017, 9:57
>> To: llvm-dev at lists.llvm.org [llvm-dev at lists.llvm.org]
>> CC: Filipe Cabecinhas [me at filcab.net]
>> Subject: [llvm-dev] RFC phantom memory intrinsic
>>
>> Hi,
>> For PR21780 solution, I plan to add a new functionality to restore
>> memory operations that was once deleted, in this particular case it is
>> the load operations that were deleted by InstCombine, please note that
>> once the load was removed there is no way to restore it back and that
>> prevents us from vectorizing the shuffle operation. There are probably
>> more similar issues where this approach could be applied.
>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty) intrinsic for that,
>> indicating that for particular pointer let's call it %ptr we observed
>> maximum possible offset at which there was reference by its type in a
>> function. After InstCombine deleted the load operation, it could be
>> restored in SLPVectorizer and we could restore chains of GEPs, Loads
>> and Inserts in case we encounter phatom_mem intrinsic.
>>
>> Here is two part review:
>> https://reviews.llvm.org/D37579 - InstCombine part.
>> https://reviews.llvm.org/D37648 - SLP part.
>>
>> Also, there might be different approaches in describing deleted memory
>> operations, for example, for my case: phantom_load(llvm_anyptr_ty,
>> llvm_i64_ty). First parameter describes pointer and second parameter
>> offset from pointer this loaded was deleted, for example. This two
>> operations:
>>
>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1
>> %ld1 = load double, double* %arrayidx1
>>
>> could be represented in the IR with this one: "void phantom_load(%ptr,
>> 1)" after removal. But, the approach that is already implemented in
>> both reviews looks better to me since we don't need to add intrinsic
>> for every removed operation in the IR. Also, while constructing such
>> form in the IR we have to be careful since some pointer operations
>> might be in loops and as the result we might end up construction an
>> incorrect IR. So, I just avoid to notice any pointer operation if it
>> is belong to a loop, except those where the the whole chain of
>> operations pointer origin, GEP, Load, Shuffle operation are in the
>> same loop and in the same basic block.
>> Thanks, Dinar.
>>
>> Here is the thread for this issue regarding using metadata:
>> http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
More information about the llvm-dev
mailing list