[llvm-dev] What should a truncating store do?

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 25 11:08:01 PDT 2017


On 9/25/2017 9:14 AM, Björn Pettersson A wrote:
>
> (Not sure if this exactly maps to “truncating store”, but I think it 
> at least touches some of the subjects discussed in this thread)
>
> Our out-of-tree-target need several patches to get things working 
> correctly for us.
>
> We have introduced i24 and i40 types in ValueTypes/MachineValueTypes 
> (in addition to the normal pow-of-2 types). And we have vectors of 
> those (v2i40, v4i40).
>
> And the byte size in our target is 16 bits.
>
> When storing an i40 we need to store it as three 16-bit bytes, i.e. 48 
> bits.
>
> When storing a v4i40 vector it will be stored as 4x48 bits.
>
> One thing that we have had to patch is the getStoreSize() method in 
> ValueTypes/MachineValueTypes where we assume that vectors are 
> bitpacked when the element size is smaller than the byte size 
> (“BitsPerByte”):
>
>      /// Return the number of bytes overwritten by a store of the 
> specified value
>
>      /// type.
>
>      unsigned getStoreSize() const {
>
> -      return (getSizeInBits() + 7) / 8;
>
> +      // We assume that vectors with elements smaller than the byte 
> size are
>
> +      // bitpacked. And that elements larger than the byte size 
> should be padded
>
> +      // (e.g. i40 type for Phoenix is stored using 3 bytes (48 bits)).
>
> +      bool PadElementsToByteSize =
>
> +        isVector() && getScalarSizeInBits() >= BitsPerByte;
>
> +      if (PadElementsToByteSize)
>
> +        return getVectorNumElements() * getScalarType().getStoreSize();
>
> +      return (getSizeInBits() + (BitsPerByte-1)) / BitsPerByte;
>
>      }
>
> The patch seems to work for in-tree-target tests as well as our 
> out-of-tree target.
>
> If it is a correct assumption for all targets is beyond my knowledge. 
> Maybe only i1 vectors should be bitpacked?
>
> Anyway, I think the bitpacked cases is very special (we do not use it 
> for our target…).
>
> AFAIK bitcast is defined as writing to memory followed by a load using 
> a different type. And I think that doing several scalar operations 
> should give the same result as when using vectors. So bitcast of 
> bitpacked vectors should probably be avoided?
>

Yes, store+load is the right definition of bitcast.  And in fact, the 
backend will lower a bitcast to a store+load to a stack temporary in 
cases where there isn't some other lowering specified.

The end result is probably going to be pretty inefficient unless your 
target has a special instruction to handle it (x86 has pmovmskb for i1 
vector bitcasts, but otherwise you probably end up with some terrible 
lowering involving a lot of shifts).

> This also reminded me of the following test case that is in trunk: 
>  test/CodeGen/X86/pr20011.ll
>
> %destTy = type { i2, i2 }
>
> define void @crash(i64 %x0, i64 %y0, %destTy* nocapture %dest) nounwind {
>
> ; X64-LABEL: crash:
>
> ; X64:       # BB#0:
>
> ; X64-NEXT:    andl $3, %esi
>
> ; X64-NEXT:    movb %sil, (%rdx)
>
> ; X64-NEXT:    andl $3, %edi
>
> ; X64-NEXT:    movb %dil, (%rdx)
>
> ; X64-NEXT:    retq
>
>   %x1 = trunc i64 %x0 to i2
>
>   %y1 = trunc i64 %y0 to i2
>
>   %1 = bitcast %destTy* %dest to <2 x i2>*
>
> %2 = insertelement <2 x i2> undef, i2 %x1, i32 0
>
> %3 = insertelement <2 x i2> %2, i2 %y1, i32 1
>
>   store <2 x i2> %3, <2 x i2>* %1, align 1
>
>   ret void
>
> }
>
> As you can see by the “X64” checks the behavior is quite weird.
>
> Both movb instructions writes to the same address. So the result of 
> the store <2 x i2> will be the same as when only storing one of the 
> elements.
>
> Is this really expected?
>
> We have emailed Simon Pilgrim who added the test case to show that we 
> no longer crash on this test case (see 
> https://bugs.llvm.org/show_bug.cgi?id=20011). But even if the compiler 
> doesn’t crash, the behavior seems wrong to me.
>
Yes, the behavior here is wrong. 
DAGTypeLegalizer::SplitVecOp_STORE/DAGTypeLegalizer::SplitVecRes_LOAD/etc. 
assume the element size is a multiple of 8.  I'm sure this has been 
discussed before, but I guess nobody ever wrote a patch to fix it...?

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170925/ef96db6e/attachment.html>


More information about the llvm-dev mailing list