[cfe-dev] [Mesa-dev] 3 element vectors in opencl 1.1+
Jan Vesely
jan.vesely at rutgers.edu
Thu Apr 24 09:12:13 PDT 2014
On Thu, 2014-04-24 at 17:53 +0200, Francisco Jerez wrote:
> Jan Vesely <jan.vesely at rutgers.edu> writes:
>
> > On Thu, 2014-04-24 at 12:05 +0200, Francisco Jerez wrote:
> > <snip>
> >> >>>
> >> >>> I don't think that using getTypeAllocSize() instead of
> >> >>> getTypeStoreSize() to calculate clover::argument::size would be a
> >> >>> satisfactory solution. By doing that you'd redefine the API argument
> >> >>> size exposed to the host for *all* argument types to be the
> >> >>> device-dependent aligned size, which is definitely not what you want.
> >> >>
> >> >>> AFAIK 3-element vectors are an exception because they are the only types
> >> >>> that are defined to have a different API size from their actual usable
> >> >>> size, so they probably deserve special handling in invocation.cpp (as
> >> >>> you did in your first patch). As the API size is target-independent I
> >> >>> don't think that the fix belongs in Clang or LLVM, Clover is likely at
> >> >>> fault.
> >> >>
> >> >> The way I understood the ch 6.1.5 is that both API and OpenCL C 3
> >> >> element vectors are required to be 4*sizeof(component). So a
> >> >> sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
> >> >> target independent.
> >> >
> >> > Well, sizeof() is supposed to take into account the alignment, so this
> >> > should be the case already.
> >
> > AFAIK sizeof only includes padding. I explain below.
> >
>
> Yes, and 3-component vectors are being padded to 4 components already.
>
> >> >
> >> >> That's why clang (or more precisely libclc) looked like a correct
> >> >> place.
> >> >>
> >> >
> >> > Right. I guess the other possibility would be to redefine cl_float3 as
> >> > cl_float4 in libclc.
> >>
> >> Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
> >> element type "gentype".
> >>
> >> > You mentioned that it had undesired consequences. Which exactly?
> >
> > With this change:
> > -typedef __attribute__((ext_vector_type(3))) float float3;
> > +typedef __attribute__((ext_vector_type(4))) float float3;
> >
> > errors about duplicate ops (since type3 and type4 are now the same type)
> > can be fixed. However, I don't know a clean way to fix the following:
> >
> > error: too few elements in vector initialization (expected 4 elements,
> > have 3)
> >
> > Even if we can get rid of it, or reduce it to warning.
> > having the same type for type3 and type4 causes problem that we won't be
> > able to distinguish following situations:
> >
> > flaot4 A
> >
> > float3 B = A.xyz; // This is OK, should not produce warning/error;
> >
> > float4 C = A.xyz; // This should at least produce a warning. even if we
> > allow using fewer elements for vector initialization.
> >
>
> Right, I agree that this wouldn't be a good idea.
>
> >
> >> >
> >> >> I understand that target device can have stricter alignment rules, but I
> >> >> don't see how it can have different type sizes (my reading of the specs
> >> >> is that these are binding for the target as well).
> >> >>
> >> >
> >> > In theory the sizes are binding for most built-in types, but R600
> >> > doesn't support certain integer types so clover has code to take into
> >> > account the differing size, alignment and endianness between host and
> >> > device. I guess that in most cases it would be possible to use the
> >> > "official" ABI for passing kernel arguments to the device (and that's a
> >> > requirement for your solution using DataLayout::getTypeAllocSize() to
> >> > work reliably, otherwise you'll be taking into account device-specific
> >> > padding), but you'll have to fix the R600 back-end so it's able to deal
> >> > with (i.e. lower) all built-in types specified by OpenCL. I think that
> >> > this would be useful on its own, Tom should know better than I how
> >> > difficult it would be.
> >
> > I didn't know about the additional R600 type size magic, in this case I
> > agree that the original approach (detect 3elem vectors and change size)
> > is probably the best in this situation.
> >
> >> >
> >> >> I can resend the original patch with debug output replaced by a comment.
> >> >>
> >> > A slightly more general solution than what you did could be to align the
> >> > store size of scalar and vector arguments to the next power of two to
> >> > calculate the API size (in particular, that would make 3- and 4-element
> >> > vectors have the same size). From 6.1.5: "A built-in data type that is
> >> > not a power of two bytes in size must be aligned to the next larger
> >> > power of two."
> >
> > This part is what made me think that the _alignment_ only restricts
> > starting address (and not size).
>
> It also restricts the size. From section 6.3 on the sizeof operator:
> "The sizeof operator yields the size (in bytes) of its operand,
> including any padding bytes (refer to section 6.1.5) needed for
> alignment [...]. When applied to an operand that has structure or union
> type, the result is the total number of bytes in such an object,
> including internal and trailing padding."
>
> > Otherwise, the addition of 3 element vector special case would not be
> > necessary.
>
> Maybe it wouldn't be strictly necessary, but it's still clarifying as an
> illustrative corollary of the general rule.
>
> > i.e I think that the
> > following mem layout is legal in OCL 1.0 but not in OCL1.1+
> >
> > 0x10: float3 here
> > 0x1c: int here // this should belong to float3 in ocl 1.1+
>
> Does OpenCL 1.0 support 3-component vectors at all?
ah, I missed that.
>
> >
> > while the following is illegal in both:
> >
> > 0x8: float3 here
> >
> > So, I don't think that the quoted text requires all builtin types to
> > have 2^x size (although all but 3 element vectors do).
> >
>
> Well, don't interpret it that way if you don't want to, but all OpenCL
> built-in types do (or at least the ones you are allowed to use as kernel
> argument types), so this solution seems more elegant to me than
> special-casing 3-component vectors.
with no 3 element vectors in ocl1.0 and the sizeof clarification. it all
makes sense to me. Thanks for the explanation.
I'll post a patch that checks the power-of-two rule.
thanks again,
Jan
>
> Thanks.
>
> > regards,
> > Jan
> >
> >
> >> >
> >> >> regards,
> >> >> Jan
> >> >>
> >> >
> >> > Thanks.
> >> >
> >> >>>
> >> >>> Thanks.
> >> >>>
> >> >>> > regards,
> >> >>> > Jan
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Jan Vesely <jan.vesely at rutgers.edu>
> >> >>> > _______________________________________________
> >> >>> > mesa-dev mailing list
> >> >>> > mesa-dev at lists.freedesktop.org
> >> >>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >> >>
> >> >> --
> >> >> Jan Vesely <jan.vesely at rutgers.edu>
--
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140424/ad769c77/attachment.sig>
More information about the cfe-dev
mailing list