[PATCH][X86] Fix missed selection of non-temporal store of zero vector (PR19370).

Andrea Di Biagio andrea.dibiagio at gmail.com
Thu Oct 16 14:50:57 PDT 2014


Hi Quentin,

On Thu, Oct 16, 2014 at 7:49 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> Hi Andrea,
>
> The general direction looks good, just a couple of questions:
> Since we are matching integer types, shouldn’t we use the integer variant:
> MOVNTD?

The only integer variant available would be movntdq (SSE2) and
vmovntdq (AVX) which can be used to store quad-words.
Since we are storing all zeros in the destination, then I think in
this case it should be ok to use it.
In case, then I would probably need to guard the pattern that selects
'movntdq' with a 'let Predicates = [HasSSE2] in'.

I noticed however that if we use the integer variant MOVNTDQmr, the
backend would eventually convert it back to a MOVNTPSmr in the attempt
to fix-up the execution dependency (I guess that is because the zero
vector is generated by a xorps).

>
> Moreover, wouldn’t it make sense to have a more general pattern for this?
> I.e., we could use VR128 instead of immAllZerosV, couldn’t we?

Originally I tried to fix ths bug using VR128 instead of immAllZerosV.
However, in all my experiments, the only case I found where the
backend wrongly selected a normal store instead of a non-temporal
store was the case where a zero vector is used as value in input. This
is why eventually I decided to explicitly use immAllZerosV.

That said, If I replace immAllZerosV with VR128 it would work.
However, I don't think (correct me if I am wrong) we would be able to
use the integer variant MOVNTDQmr anymore, since the input vector may
not be a zero vector.

What do you think?

Thanks,
Andrea

>
> Thanks,
> -Quentin
>
> On Oct 16, 2014, at 10:41 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>
> Hi,
>
> This is a fix for PR19370.
> Currently the x86 backend wrongly selects a a normal vector store
> instead of a non-temporal store if the value to store is a vector of
> all zeros.
>
> Small reproducible:
>
> ;;
> define void @test(<4 x float>* %dst) {
>  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16,
> !nontemporal !1
>  ret void
> }
>
> !1 = metadata !{i32 1}
> ;;
>
> llc (-mtriple=x86_64-unknown-unknown -mcpu=corei7-avx) generates:
>  vxorps  %xmm0, %xmm0, %xmm0
>  vmovaps %xmm0, (%rdi)
>  retq
>
> Instead, llc should generate:
>  vxorps  %xmm0, %xmm0, %xmm0
>  vmovntps  %xmm0, (%rdi)
>  retq
>
> In this example, the vector of all zeros is legalized to a zero vector
> of type v4i32.
> However, ISel doesn't have a rule to select a MOVNTPSmr when the
> source is not a vector of float elements. So, it eventually (wrongly)
> falls back to selecting a normal store.
>
> This patch fixes the problem adding extra ISel patterns to cover that
> particular corner case.
>
> Please let me know if ok to commit.
>
> Thanks!
> Andrea
> <patch-nontemporal.diff>
>
>




More information about the llvm-commits mailing list