[cfe-dev] Is this a bug? 'v3f32' has size '16' and not '12

Martin J. O'Riordan via cfe-dev cfe-dev at lists.llvm.org
Tue Jan 19 12:09:34 PST 2016


I should have used 'float3' and 'float' instead of 'char3' and 'char', but
the basics are the same.

-----Original Message-----
From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com] 
Sent: 19 January 2016 09:47
To: 'Clang Dev' <cfe-dev at lists.llvm.org>
Subject: RE: Is this a bug? 'v3f32' has size '16' and not '12

[Subject changed; it was "Vectors with non-power-of-2 elements"]

I've looked at this at bit more, and it looks like it is a bug.  The 'clang'
front-end permits vectors to be declared which have a non-power-of-2 number
of elements, while 'gcc' forbids the size of a vector to have a
non-power-of-2 number of bytes.

But beyond permitting the declaration it does not appear to follow through
on the logical semantics.

When LLVM sees these, it will either split a vector which exceeds the size
of a natural vector register, or widen it if it is too small.

This is okay for lowering arithmetic and other operations within the
processor, but both the size and memory accesses are not consistent with the
declared type.

For programs that iterate over images, it is very common to view the element
3, 5 or 7 elements at a time.  The underlying frame is typically an array of
the corresponding scalar type, but the programmer needs to take advantage of
accessing it using explicit vectorisation.  For example:

  char row[FRAMESIZE];

  for(char3 x = (char3*)(row + 1); x < endtest; ++x)
    use(x);

But this surprisingly accesses 16-bytes at a time from memory, and not 12.
For reads this is not a big problem provided the access stays within valid
addressable memory; but for writes the excess overwrite is critical.

Does anyone know about how this is supposed to behave?  The IR for the
memory accesses and the 'sizeof' are generated by 'clang', so it is already
too late for the target.  I haven't been able to find a target configurable
feature in 'clang' that would allow me to get the behaviour I need.

Thanks,

	MartinO

-----Original Message-----
From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com] 
Sent: 04 January 2016 17:37
To: 'Clang Dev'
Subject: RE: Vectors with non-power-of-2 elements

'-v12:8' was a brain-macro typo - it should be '-v24:8' :-)

-----Original Message-----
From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com] 
Sent: 04 January 2016 16:53
To: 'Clang Dev' <cfe-dev at lists.llvm.org>
Subject: Vectors with non-power-of-2 elements

We are experiencing a number of problems with handling vectors whose number
of elements is not a power-of-2, and in particular 3-element vectors.  With
the following example:

  #include <stdio.h>

  typedef float __attribute__((ext_vector_type(3))) float3;  // Clang Only
  // typedef float __attribute__((vector_size(12))) float3;  // For GCC

  volatile float3 v3f32;

  int main() {
    float3 f3 = { 1.1f, 2.2f, 3.3f };
    printf ( "Sizeof 'float3' is %d\n", sizeof(float3));

    v3f32 = f3; // Force a write
    f3 = v3f32; // Force a read

    return 0;
  }

'clang' reports the size as being 16-bytes, and transacts the object to and
from memory as 16-bytes.  Also, when vectors of 3-elements are passed with
VARARGS, I have to use 'va_arg' with the 4-element variant or the compiler
will crash when validating the types.

We have no special code for handling 3-element vectors, and I have
subsequently tried this with the X86 binary distributions of 'clang' v3.5.2
and v3.7.0 and I observe the same issue as we are seeing in our SHAVE
target.

With 'gcc' and the 'element_size' variant, I get an error complaining that
the number of bytes is not a power-of-2, but a comment in
'tools/clang/lib/Sema/SemaType.cpp' says:

  // Success! Instantiate the vector type, the number of elements is > 0,
and
  // not required to be a power of 2, unlike GCC.

which would lead me to believe that 3-element vectors should be fine.

Is there something I have to describe in my target machine implementation or
target transform information that will allow 'float3' above be 12-bytes, and
to transact to memory using 12-byte transfers?  Or is this a more general
bug in the implementation?  I have experimented with DataLayout changes such
as:

  -v96:32
  -v48:16
  -v12:8

but this just results in crashes in LLVM.

With the types of algorithms that are developed for our platform, 3-element
vectors are quite common.  Less common, but also fairly frequent are
5-element and 7-element vectors (pixel analysis and 2D convolutions).
OpenCL provides for 2-, 3-, 4-, 8- and 16-element vectors, but it is not
clear to me that the 3-element vector support for OpenCL is working either.
Longer term, it would be valuable to us if Clang/LLVM supported 3-, 5- and
7-element vectors as first-class citizens of the compiler (e.g. v3f32, v7i8,
etc.), but that is a topic for another day.  For now I am happy if I can get
the 'v3X' types working.

Thanks,

	MartinO - Movidius Ltd.





More information about the cfe-dev mailing list