[PATCH] Fix PR15267
Michael Liao
michael.liao at intel.com
Tue Feb 19 23:33:31 PST 2013
Hi Nadav,
Sorry, I cannot follow your concern. This bug is on <4 x i1> extload.
The logic will load 1 byte and extract 4 bits one by one. Final <4 x i1>
is built from these 4 extracted bits.
For the test case in bug report, the following code is generated after
the patch:
test3: # @test3
# BB#0:
movzbl (%rdi), %eax
movq %rax, %rcx
shlq $62, %rcx
sarq $63, %rcx
movq %rax, %rdx
shlq $60, %rdx
sarq $63, %rdx
vmovd %rdx, %xmm0
movq %rax, %rdx
shlq $61, %rdx
sarq $63, %rdx
vmovd %rdx, %xmm1
vpunpcklqdq %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
vmovd %rcx, %xmm1
shlq $63, %rax
sarq $63, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1 # xmm1 = xmm2[0],xmm1[0]
vinsertf128 $1, %xmm0, %ymm1, %ymm0
ret
It could be observed that only 1 byte is loaded, all the other sequence
are signed-extracting bit from that byte.
Yours
- Michael
On Tue, 2013-02-19 at 23:24 -0800, Nadav Rotem wrote:
> I think that this patch is incorrect. In your code you save <4 x i1> as 4 consecutive bytes in memory, but this mask needs to be saved as the four lower bits in a single byte.
> Some targets (For example MIC) do have mask registers, and on these targets saving the mask register to memory should be the same as targets that don't have the mask register. In other words, the bits need to be packed.
>
> Thanks,
> Nadav
>
> On Feb 19, 2013, at 11:15 PM, Michael Liao <michael.liao at intel.com> wrote:
>
> > Thanks for review. Test case is revised following your suggestion plus
> > one minor fix in the logic, when we load bytes, we only need
> > any-extload, which is irrelevant to the extload type in the original
> > vector load.
> >
> > Thanks
> > - Michael
> >
> > On Tue, 2013-02-19 at 23:05 -0800, Nadav Rotem wrote:
> >> Michael,
> >>
> >> Your test case does not check the mechanism that you described. Please write a test case that checks that there is a sequence of shifts and ands.
> >>
> >> Thanks,
> >> Nadav
> >>
> >> On Feb 19, 2013, at 10:29 PM, Michael Liao <michael.liao at intel.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> The root cause of PR15267 is the current logic assume byte-addressable
> >>> element during extloading from a vector. With <4 x i1> in memory, this
> >>> logic breaks. The attached patch fixes that by loading all bytes
> >>> associated to that type, extracting bits and re-packing them again to
> >>> form each element.
> >>>
> >>> Thanks for review!
> >>> - Michael
> >>>
> >>> <0001-Fix-PR15267.patch>_______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >
> > <0001-Fix-PR15267.patch>
>
More information about the llvm-commits
mailing list