[PATCH] Fix PR15267

Michael Liao michael.liao at intel.com
Tue Feb 19 23:33:31 PST 2013


Hi Nadav,

Sorry, I cannot follow your concern. This bug is on <4 x i1> extload.
The logic will load 1 byte and extract 4 bits one by one. Final <4 x i1>
is built from these 4 extracted bits.

For the test case in bug report, the following code is generated after
the patch:

test3:                                  # @test3
# BB#0:
	movzbl	(%rdi), %eax
	movq	%rax, %rcx
	shlq	$62, %rcx
	sarq	$63, %rcx
	movq	%rax, %rdx
	shlq	$60, %rdx
	sarq	$63, %rdx
	vmovd	%rdx, %xmm0
	movq	%rax, %rdx
	shlq	$61, %rdx
	sarq	$63, %rdx
	vmovd	%rdx, %xmm1
	vpunpcklqdq	%xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
	vmovd	%rcx, %xmm1
	shlq	$63, %rax
	sarq	$63, %rax
	vmovd	%rax, %xmm2
	vpunpcklqdq	%xmm1, %xmm2, %xmm1 # xmm1 = xmm2[0],xmm1[0]
	vinsertf128	$1, %xmm0, %ymm1, %ymm0
	ret

It could be observed that only 1 byte is loaded, all the other sequence
are signed-extracting bit from that byte.

Yours
- Michael


On Tue, 2013-02-19 at 23:24 -0800, Nadav Rotem wrote:
> I think that this patch is incorrect.  In your code you save <4 x i1> as 4 consecutive bytes in memory, but this mask needs to be saved as the four lower bits in a single byte.
> Some targets (For example MIC) do have mask registers, and on these targets saving the mask register to memory should be the same as targets that don't have the mask register. In other words, the bits need to be packed.
> 
> Thanks,
> Nadav   
> 
> On Feb 19, 2013, at 11:15 PM, Michael Liao <michael.liao at intel.com> wrote:
> 
> > Thanks for review. Test case is revised following your suggestion plus
> > one minor fix in the logic, when we load bytes, we only need
> > any-extload, which is irrelevant to the extload type in the original
> > vector load.
> > 
> > Thanks
> > - Michael
> > 
> > On Tue, 2013-02-19 at 23:05 -0800, Nadav Rotem wrote:
> >> Michael, 
> >> 
> >> Your test case does not check the mechanism that you described.  Please write a test case that checks that there is a sequence of shifts and ands. 
> >> 
> >> Thanks,
> >> Nadav
> >> 
> >> On Feb 19, 2013, at 10:29 PM, Michael Liao <michael.liao at intel.com> wrote:
> >> 
> >>> Hi All,
> >>> 
> >>> The root cause of PR15267 is the current logic assume byte-addressable
> >>> element during extloading from a vector. With <4 x i1> in memory, this
> >>> logic breaks. The attached patch fixes that by loading all bytes
> >>> associated to that type, extracting bits and re-packing them again to
> >>> form each element.
> >>> 
> >>> Thanks for review!
> >>> - Michael
> >>> 
> >>> <0001-Fix-PR15267.patch>_______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> 
> > 
> > <0001-Fix-PR15267.patch>
> 





More information about the llvm-commits mailing list