[cfe-dev] Question about "CodeGenFunction::EmitLoadOfScalar" with vector type of 3 elements

Thu Mar 9 04:03:12 PST 2017

Hi Anastasia,

I appreciate your response. I think we need to keep 
"ScalarExprEmitter::VisitAsTypeExpr" between vec3 and vec4, as we want 
to maintain the features of the OpenCL source language. If llvm has 
intrinsic function on IR for the __builtin_astype, we could generate it 
and llvm's CodeGen could handle it. I have found other location for vec3 
and it is "CodeGenFunction::EmitStoreOfScalar". I have simply added a 
clang's CodeGen Option to preseve vec3. I have attached the diff file 
and a test. If I missed something, please let me know.

Thanks,

JinGu Kang

On 08/03/17 13:05, Anastasia Stulova wrote:
>
> I think the problem is that the borderline for IR being target 
> independent is very vague in general. In this case specifically the 
> issue is that the Spec is very explicit about threating this as 4 
> element aligned type. However, I agree this lowering could be done 
> later as well. The approach to condition this on the Target property 
> sounds reasonable. I think we have other places in Clang where vec3 is 
> threated as vec4 (e.g. ScalarExprEmitter::VisitAsTypeExpr). Those 
> would have to be handled too. Feel free to propose a prototype.
>
> Cheer,
>
> Anastasia
>
> *From:*cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of 
> *jingu at codeplay.com via cfe-dev
> *Sent:* 08 March 2017 11:01
> *To:* aleksey.bader at gmail.com
> *Cc:* 'cfe-dev at lists.llvm.org' (cfe-dev at lists.llvm.org)
> *Subject:* Re: [cfe-dev] Question about 
> "CodeGenFunction::EmitLoadOfScalar" with vector type of 3 elements
>
> Hi Alexey,
>
> I appreciate your response. My colleague and I are implementing a 
> transformation pass between LLVM IR and another IR and we want to keep 
> the 3-component vector types in our target IR. As you mentioned, the 
> 4-component vector type conversion code is not problem. But I usually 
> expect clang generates more target independent LLVM IR except target 
> specific properties like calling convention, memory layout of 
> variables, etc. clang can keep the 3-component vector type operations 
> and llvm codegen can handle them according to target. At present, 
> we're having to undo Clang's transformation of vec3 -> vec4, to 
> recreate the original type information, which is unfortunate. Would it 
> be possible to add an option to control the behaviour?
>
> Thanks,
>
> JinGu Kang
>
> On 07/03/17 18:19, aleksey.bader at gmail.com 
> <mailto:aleksey.bader at gmail.com> wrote:
>
>     Hi JinGu,
>
>     I don't think it should be a problem for OpenCL. 3-component
>     vector is aligned as 4-component vector (see section 6.1.5
>     "Alignment of Type" of OpenCL C kernel language specification v2.0).
>
>     AFAIK, almost all existing OpenCL compilers are based on clang and
>     there seems to be no problems with handling load/store operations
>     this way.
>
>     Could you elaborate on the case where this approach doesn't work?
>
>     Thanks,
>
>     Alexey
>
>     On Mon, Mar 6, 2017 at 6:47 PM, jingu at codeplay.com
>     <mailto:jingu at codeplay.com> via cfe-dev <cfe-dev at lists.llvm.org
>     <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>     Hi All,
>
>
>     I have a question about "CodeGenFunction::EmitLoadOfScalar". I am
>     compiling code with vector type of 3 elements like int3 or float3.
>     Clang converts the vector load to different vector load with 4
>     element vector type because there is code on
>     "CodeGenFunction::EmitLoadOfScalar" as follows:
>
>     1312   // For better performance, handle vector loads differently.
>     1313   if (Ty->isVectorType()) {
>     1314     const llvm::Type *EltTy = Addr.getElementType();
>     1315
>     1316     const auto *VTy = cast<llvm::VectorType>(EltTy);
>     1317
>     1318     // Handle vectors of size 3 like size 4 for better
>     performance.
>     1319     if (VTy->getNumElements() == 3) {
>     1320
>     1321       // Bitcast to vec4 type.
>     1322       llvm::VectorType *vec4Ty =
>     llvm::VectorType::get(VTy->getElementType(),
>     1323       4);
>     1324       Address Cast = Builder.CreateElementBitCast(Addr,
>     vec4Ty, "castToVec4");
>     1325       // Now load value.
>     1326       llvm::Value *V = Builder.CreateLoad(Cast, Volatile,
>     "loadVec4");
>
>     4 element vector load could generate aligned vector load in the
>     end and it would be better in usual. But it is not good for other
>     target or language like OpenCL which supports 3 element vector
>     type natively. Can we consider this situation on
>     "CodeGenFunction::EmitLoadOfScalar" like this "if
>     (!getLangOpts().OpenCL)" or with target specific property on
>     TargetCodeGenInfo?
>
>     If I missed something, please let me know.
>
>     Thanks,
>     JinGu Kang
>
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170309/04db77d1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vec3.diff
Type: text/x-patch
Size: 6234 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170309/04db77d1/attachment.bin>
-------------- next part --------------
// RUN: %clang_cc1 %s -emit-llvm -o - -triple spir-unknown-unknown -preserve-vec3-type  | FileCheck %s

typedef float float3 __attribute__((ext_vector_type(3)));
typedef float float4 __attribute__((ext_vector_type(4)));

void kernel foo(global float3 *a, global float3 *b) {
  // CHECK: %[[LOAD_A:.*]] = load <3 x float>, <3 x float> addrspace(1)* %a
  // CHECK: store <3 x float> %[[LOAD_A]], <3 x float> addrspace(1)* %b
  *b = *a;
}