[cfe-dev] Is this a bug? 'v3f32' has size '16' and not '12
Stephen Canon via cfe-dev
cfe-dev at lists.llvm.org
Tue Jan 19 12:38:12 PST 2016
While it is not especially well documented, this is the expected behavior for ext_vector_type(3) [i.e. not a bug]. It would be swell if someone wanted to add support for packed non-power-of-two vectors, but I’m not sure what would be required to make that happen.
– Steve
> On Jan 19, 2016, at 1:47 AM, Martin J. O'Riordan via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> [Subject changed; it was "Vectors with non-power-of-2 elements"]
>
> I've looked at this at bit more, and it looks like it is a bug. The 'clang'
> front-end permits vectors to be declared which have a non-power-of-2 number
> of elements, while 'gcc' forbids the size of a vector to have a
> non-power-of-2 number of bytes.
>
> But beyond permitting the declaration it does not appear to follow through
> on the logical semantics.
>
> When LLVM sees these, it will either split a vector which exceeds the size
> of a natural vector register, or widen it if it is too small.
>
> This is okay for lowering arithmetic and other operations within the
> processor, but both the size and memory accesses are not consistent with the
> declared type.
>
> For programs that iterate over images, it is very common to view the element
> 3, 5 or 7 elements at a time. The underlying frame is typically an array of
> the corresponding scalar type, but the programmer needs to take advantage of
> accessing it using explicit vectorisation. For example:
>
> char row[FRAMESIZE];
>
> for(char3 x = (char3*)(row + 1); x < endtest; ++x)
> use(x);
>
> But this surprisingly accesses 16-bytes at a time from memory, and not 12.
> For reads this is not a big problem provided the access stays within valid
> addressable memory; but for writes the excess overwrite is critical.
>
> Does anyone know about how this is supposed to behave? The IR for the
> memory accesses and the 'sizeof' are generated by 'clang', so it is already
> too late for the target. I haven't been able to find a target configurable
> feature in 'clang' that would allow me to get the behaviour I need.
>
> Thanks,
>
> MartinO
>
> -----Original Message-----
> From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com]
> Sent: 04 January 2016 17:37
> To: 'Clang Dev'
> Subject: RE: Vectors with non-power-of-2 elements
>
> '-v12:8' was a brain-macro typo - it should be '-v24:8' :-)
>
> -----Original Message-----
> From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com]
> Sent: 04 January 2016 16:53
> To: 'Clang Dev' <cfe-dev at lists.llvm.org>
> Subject: Vectors with non-power-of-2 elements
>
> We are experiencing a number of problems with handling vectors whose number
> of elements is not a power-of-2, and in particular 3-element vectors. With
> the following example:
>
> #include <stdio.h>
>
> typedef float __attribute__((ext_vector_type(3))) float3; // Clang Only
> // typedef float __attribute__((vector_size(12))) float3; // For GCC
>
> volatile float3 v3f32;
>
> int main() {
> float3 f3 = { 1.1f, 2.2f, 3.3f };
> printf ( "Sizeof 'float3' is %d\n", sizeof(float3));
>
> v3f32 = f3; // Force a write
> f3 = v3f32; // Force a read
>
> return 0;
> }
>
> 'clang' reports the size as being 16-bytes, and transacts the object to and
> from memory as 16-bytes. Also, when vectors of 3-elements are passed with
> VARARGS, I have to use 'va_arg' with the 4-element variant or the compiler
> will crash when validating the types.
>
> We have no special code for handling 3-element vectors, and I have
> subsequently tried this with the X86 binary distributions of 'clang' v3.5.2
> and v3.7.0 and I observe the same issue as we are seeing in our SHAVE
> target.
>
> With 'gcc' and the 'element_size' variant, I get an error complaining that
> the number of bytes is not a power-of-2, but a comment in
> 'tools/clang/lib/Sema/SemaType.cpp' says:
>
> // Success! Instantiate the vector type, the number of elements is > 0,
> and
> // not required to be a power of 2, unlike GCC.
>
> which would lead me to believe that 3-element vectors should be fine.
>
> Is there something I have to describe in my target machine implementation or
> target transform information that will allow 'float3' above be 12-bytes, and
> to transact to memory using 12-byte transfers? Or is this a more general
> bug in the implementation? I have experimented with DataLayout changes such
> as:
>
> -v96:32
> -v48:16
> -v12:8
>
> but this just results in crashes in LLVM.
>
> With the types of algorithms that are developed for our platform, 3-element
> vectors are quite common. Less common, but also fairly frequent are
> 5-element and 7-element vectors (pixel analysis and 2D convolutions).
> OpenCL provides for 2-, 3-, 4-, 8- and 16-element vectors, but it is not
> clear to me that the 3-element vector support for OpenCL is working either.
> Longer term, it would be valuable to us if Clang/LLVM supported 3-, 5- and
> 7-element vectors as first-class citizens of the compiler (e.g. v3f32, v7i8,
> etc.), but that is a topic for another day. For now I am happy if I can get
> the 'v3X' types working.
>
> Thanks,
>
> MartinO - Movidius Ltd.
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list