[cfe-dev] Is this a bug? 'v3f32' has size '16' and not '12

Stephen Canon via cfe-dev cfe-dev at lists.llvm.org
Tue Jan 19 12:38:12 PST 2016


While it is not especially well documented, this is the expected behavior for ext_vector_type(3) [i.e. not a bug].  It would be swell if someone wanted to add support for packed non-power-of-two vectors, but I’m not sure what would be required to make that happen.

– Steve

> On Jan 19, 2016, at 1:47 AM, Martin J. O'Riordan via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> [Subject changed; it was "Vectors with non-power-of-2 elements"]
> 
> I've looked at this at bit more, and it looks like it is a bug.  The 'clang'
> front-end permits vectors to be declared which have a non-power-of-2 number
> of elements, while 'gcc' forbids the size of a vector to have a
> non-power-of-2 number of bytes.
> 
> But beyond permitting the declaration it does not appear to follow through
> on the logical semantics.
> 
> When LLVM sees these, it will either split a vector which exceeds the size
> of a natural vector register, or widen it if it is too small.
> 
> This is okay for lowering arithmetic and other operations within the
> processor, but both the size and memory accesses are not consistent with the
> declared type.
> 
> For programs that iterate over images, it is very common to view the element
> 3, 5 or 7 elements at a time.  The underlying frame is typically an array of
> the corresponding scalar type, but the programmer needs to take advantage of
> accessing it using explicit vectorisation.  For example:
> 
>  char row[FRAMESIZE];
> 
>  for(char3 x = (char3*)(row + 1); x < endtest; ++x)
>    use(x);
> 
> But this surprisingly accesses 16-bytes at a time from memory, and not 12.
> For reads this is not a big problem provided the access stays within valid
> addressable memory; but for writes the excess overwrite is critical.
> 
> Does anyone know about how this is supposed to behave?  The IR for the
> memory accesses and the 'sizeof' are generated by 'clang', so it is already
> too late for the target.  I haven't been able to find a target configurable
> feature in 'clang' that would allow me to get the behaviour I need.
> 
> Thanks,
> 
> 	MartinO
> 
> -----Original Message-----
> From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com] 
> Sent: 04 January 2016 17:37
> To: 'Clang Dev'
> Subject: RE: Vectors with non-power-of-2 elements
> 
> '-v12:8' was a brain-macro typo - it should be '-v24:8' :-)
> 
> -----Original Message-----
> From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com] 
> Sent: 04 January 2016 16:53
> To: 'Clang Dev' <cfe-dev at lists.llvm.org>
> Subject: Vectors with non-power-of-2 elements
> 
> We are experiencing a number of problems with handling vectors whose number
> of elements is not a power-of-2, and in particular 3-element vectors.  With
> the following example:
> 
>  #include <stdio.h>
> 
>  typedef float __attribute__((ext_vector_type(3))) float3;  // Clang Only
>  // typedef float __attribute__((vector_size(12))) float3;  // For GCC
> 
>  volatile float3 v3f32;
> 
>  int main() {
>    float3 f3 = { 1.1f, 2.2f, 3.3f };
>    printf ( "Sizeof 'float3' is %d\n", sizeof(float3));
> 
>    v3f32 = f3; // Force a write
>    f3 = v3f32; // Force a read
> 
>    return 0;
>  }
> 
> 'clang' reports the size as being 16-bytes, and transacts the object to and
> from memory as 16-bytes.  Also, when vectors of 3-elements are passed with
> VARARGS, I have to use 'va_arg' with the 4-element variant or the compiler
> will crash when validating the types.
> 
> We have no special code for handling 3-element vectors, and I have
> subsequently tried this with the X86 binary distributions of 'clang' v3.5.2
> and v3.7.0 and I observe the same issue as we are seeing in our SHAVE
> target.
> 
> With 'gcc' and the 'element_size' variant, I get an error complaining that
> the number of bytes is not a power-of-2, but a comment in
> 'tools/clang/lib/Sema/SemaType.cpp' says:
> 
>  // Success! Instantiate the vector type, the number of elements is > 0,
> and
>  // not required to be a power of 2, unlike GCC.
> 
> which would lead me to believe that 3-element vectors should be fine.
> 
> Is there something I have to describe in my target machine implementation or
> target transform information that will allow 'float3' above be 12-bytes, and
> to transact to memory using 12-byte transfers?  Or is this a more general
> bug in the implementation?  I have experimented with DataLayout changes such
> as:
> 
>  -v96:32
>  -v48:16
>  -v12:8
> 
> but this just results in crashes in LLVM.
> 
> With the types of algorithms that are developed for our platform, 3-element
> vectors are quite common.  Less common, but also fairly frequent are
> 5-element and 7-element vectors (pixel analysis and 2D convolutions).
> OpenCL provides for 2-, 3-, 4-, 8- and 16-element vectors, but it is not
> clear to me that the 3-element vector support for OpenCL is working either.
> Longer term, it would be valuable to us if Clang/LLVM supported 3-, 5- and
> 7-element vectors as first-class citizens of the compiler (e.g. v3f32, v7i8,
> etc.), but that is a topic for another day.  For now I am happy if I can get
> the 'v3X' types working.
> 
> Thanks,
> 
> 	MartinO - Movidius Ltd.
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list