[PATCH] Fix PR15267
Demikhovsky, Elena
elena.demikhovsky at intel.com
Wed Feb 20 02:42:22 PST 2013
Hi Michael,
I don't understand from your patch what code will be generated for
define <4 x i64> @test3(<4 x i1>* %in) nounwind {
%wide.load35 = load <4 x i1>* %in, align 1
%sext = sext <4 x i1> %wide.load35 to <4 x i64>
ret <4 x i64> %sext
}
Could you, please, send me AVX and AVX2 versions?
Thanks.
- Elena
-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Michael Liao
Sent: Wednesday, February 20, 2013 10:41
To: Nadav Rotem
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] Fix PR15267
Oops! Forget attaching. Sorry.
- Michael
On Wed, 2013-02-20 at 00:08 -0800, Michael Liao wrote:
> OK, more test cases are added following your suggestion. Thanks for
> review. The test case mixing SEXT and load are reserved and it's
> reported in bugzilla.
>
> BTW, ofc, as a symmetric issue, trunc store needs similar fix as well.
> I am working on that.
>
> Yours
> - Michael
>
> On Tue, 2013-02-19 at 23:46 -0800, Nadav Rotem wrote:
> > Hi Michael,
> >
> > Sorry. I saw that your test case had 4 moves and I assumed that you are loading from memory.
> >
> > In your test case you are mixing the SEXTing and the loading. You can load <4 x i1> and use it as a return value. Also, please check other packed values such as <4 x i3>.
> >
> > Thanks,
> > - Nadav
> >
> >
> > On Feb 19, 2013, at 11:33 PM, Michael Liao <michael.liao at intel.com> wrote:
> >
> > > Hi Nadav,
> > >
> > > Sorry, I cannot follow your concern. This bug is on <4 x i1> extload.
> > > The logic will load 1 byte and extract 4 bits one by one. Final <4
> > > x i1> is built from these 4 extracted bits.
> > >
> > > For the test case in bug report, the following code is generated
> > > after the patch:
> > >
> > > test3: # @test3
> > > # BB#0:
> > > movzbl (%rdi), %eax
> > > movq %rax, %rcx
> > > shlq $62, %rcx
> > > sarq $63, %rcx
> > > movq %rax, %rdx
> > > shlq $60, %rdx
> > > sarq $63, %rdx
> > > vmovd %rdx, %xmm0
> > > movq %rax, %rdx
> > > shlq $61, %rdx
> > > sarq $63, %rdx
> > > vmovd %rdx, %xmm1
> > > vpunpcklqdq %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
> > > vmovd %rcx, %xmm1
> > > shlq $63, %rax
> > > sarq $63, %rax
> > > vmovd %rax, %xmm2
> > > vpunpcklqdq %xmm1, %xmm2, %xmm1 # xmm1 = xmm2[0],xmm1[0]
> > > vinsertf128 $1, %xmm0, %ymm1, %ymm0
> > > ret
> > >
> > > It could be observed that only 1 byte is loaded, all the other
> > > sequence are signed-extracting bit from that byte.
> > >
> > > Yours
> > > - Michael
> > >
> > >
> > > On Tue, 2013-02-19 at 23:24 -0800, Nadav Rotem wrote:
> > >> I think that this patch is incorrect. In your code you save <4 x i1> as 4 consecutive bytes in memory, but this mask needs to be saved as the four lower bits in a single byte.
> > >> Some targets (For example MIC) do have mask registers, and on these targets saving the mask register to memory should be the same as targets that don't have the mask register. In other words, the bits need to be packed.
> > >>
> > >> Thanks,
> > >> Nadav
> > >>
> > >> On Feb 19, 2013, at 11:15 PM, Michael Liao <michael.liao at intel.com> wrote:
> > >>
> > >>> Thanks for review. Test case is revised following your
> > >>> suggestion plus one minor fix in the logic, when we load bytes,
> > >>> we only need any-extload, which is irrelevant to the extload
> > >>> type in the original vector load.
> > >>>
> > >>> Thanks
> > >>> - Michael
> > >>>
> > >>> On Tue, 2013-02-19 at 23:05 -0800, Nadav Rotem wrote:
> > >>>> Michael,
> > >>>>
> > >>>> Your test case does not check the mechanism that you described. Please write a test case that checks that there is a sequence of shifts and ands.
> > >>>>
> > >>>> Thanks,
> > >>>> Nadav
> > >>>>
> > >>>> On Feb 19, 2013, at 10:29 PM, Michael Liao <michael.liao at intel.com> wrote:
> > >>>>
> > >>>>> Hi All,
> > >>>>>
> > >>>>> The root cause of PR15267 is the current logic assume
> > >>>>> byte-addressable element during extloading from a vector. With
> > >>>>> <4 x i1> in memory, this logic breaks. The attached patch
> > >>>>> fixes that by loading all bytes associated to that type,
> > >>>>> extracting bits and re-packing them again to form each element.
> > >>>>>
> > >>>>> Thanks for review!
> > >>>>> - Michael
> > >>>>>
> > >>>>> <0001-Fix-PR15267.patch>______________________________________
> > >>>>> _________
> > >>>>> llvm-commits mailing list
> > >>>>> llvm-commits at cs.uiuc.edu
> > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>
> > >>>
> > >>> <0001-Fix-PR15267.patch>
> > >>
> > >
> > >
> >
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-commits
mailing list