[PATCH][X86] Fix missed selection of non-temporal store of zero vector (PR19370).

Thu Oct 16 15:17:07 PDT 2014

> On Oct 16, 2014, at 2:50 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> Hi Quentin,
> 
> On Thu, Oct 16, 2014 at 7:49 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> Hi Andrea,
>> 
>> The general direction looks good, just a couple of questions:
>> Since we are matching integer types, shouldn’t we use the integer variant:
>> MOVNTD?
> 
> The only integer variant available would be movntdq (SSE2) and
> vmovntdq (AVX) which can be used to store quad-words.
> Since we are storing all zeros in the destination, then I think in
> this case it should be ok to use it.
> In case, then I would probably need to guard the pattern that selects
> 'movntdq' with a 'let Predicates = [HasSSE2] in'.
> 
> I noticed however that if we use the integer variant MOVNTDQmr, the
> backend would eventually convert it back to a MOVNTPSmr in the attempt
> to fix-up the execution dependency (I guess that is because the zero
> vector is generated by a xorps).

I was expected this :).
The thing I had in mind was, since we are using integer type I was expecting that in the general case the computation leading to this store would have the same type. Therefore, I thought it was better to directly generate the instruction that is in the right domain than relying on the domain fixer for that.

> 
>> 
>> Moreover, wouldn’t it make sense to have a more general pattern for this?
>> I.e., we could use VR128 instead of immAllZerosV, couldn’t we?
> 
> Originally I tried to fix ths bug using VR128 instead of immAllZerosV.
> However, in all my experiments, the only case I found where the
> backend wrongly selected a normal store instead of a non-temporal
> store was the case where a zero vector is used as value in input. This
> is why eventually I decided to explicitly use immAllZerosV.

I see the rational, I just wanted to be sure we caught all the cases.

> 
> That said, If I replace immAllZerosV with VR128 it would work.
> However, I don't think (correct me if I am wrong) we would be able to
> use the integer variant MOVNTDQmr anymore, since the input vector may
> not be a zero vector.

What is the problem with that?
Unless I am mistaken, we are storing a 128-bits value and the way it is interpreted is pointless, isn’t it?

That makes me wonder, why the actual patterns for MOVNT do not work here?
We should be able to do implicit conversions from v4i32 to v2i64 (i.e., anything that stay on VR128).

-Quentin

> 
> What do you think?
> 
> Thanks,
> Andrea
> 
>> 
>> Thanks,
>> -Quentin
>> 
>> On Oct 16, 2014, at 10:41 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
>> wrote:
>> 
>> Hi,
>> 
>> This is a fix for PR19370.
>> Currently the x86 backend wrongly selects a a normal vector store
>> instead of a non-temporal store if the value to store is a vector of
>> all zeros.
>> 
>> Small reproducible:
>> 
>> ;;
>> define void @test(<4 x float>* %dst) {
>> store <4 x float> zeroinitializer, <4 x float>* %dst, align 16,
>> !nontemporal !1
>> ret void
>> }
>> 
>> !1 = metadata !{i32 1}
>> ;;
>> 
>> llc (-mtriple=x86_64-unknown-unknown -mcpu=corei7-avx) generates:
>> vxorps  %xmm0, %xmm0, %xmm0
>> vmovaps %xmm0, (%rdi)
>> retq
>> 
>> Instead, llc should generate:
>> vxorps  %xmm0, %xmm0, %xmm0
>> vmovntps  %xmm0, (%rdi)
>> retq
>> 
>> In this example, the vector of all zeros is legalized to a zero vector
>> of type v4i32.
>> However, ISel doesn't have a rule to select a MOVNTPSmr when the
>> source is not a vector of float elements. So, it eventually (wrongly)
>> falls back to selecting a normal store.
>> 
>> This patch fixes the problem adding extra ISel patterns to cover that
>> particular corner case.
>> 
>> Please let me know if ok to commit.
>> 
>> Thanks!
>> Andrea
>> <patch-nontemporal.diff>
>> 
>>