[polly] r302515 - [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen

Siddharth Bhat via llvm-commits llvm-commits at lists.llvm.org
Tue May 9 08:08:27 PDT 2017


Sure. Thanks for letting me know about the changes you want.

On Tue 9 May, 2017, 16:54 Michael Kruse, <llvm-commits at meinersbur.de> wrote:

> Can you remove the "[Polly]" prefix before committing? It's a commit
> to the Polly repository, so it does not add new information. The
> distinction of repositories does not exist in Phabricator, which is
> the reason for the prefix there.
>
> When I do commits, I usually also do some normalization, but I don't
> personally care:
> - Put a dot at the end of the title.
> - Remove "Summary:"
> - Wrap after 72 characters
> - Keep only the "Differential Revision" line. The other information
> about the review process is already available behind the link.
>
> This is what I found in most LLVM commits.
>
> Thank you.
>
> Michael
>
> 2017-05-09 12:45 GMT+02:00 Siddharth Bhat via llvm-commits
> <llvm-commits at lists.llvm.org>:
> > Author: bollu
> > Date: Tue May  9 05:45:52 2017
> > New Revision: 302515
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=302515&view=rev
> > Log:
> > [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG
> CodeGen
> >
> > Summary: PPCGCodeGeneration now attaches the size of the kernel launch
> parameters at the end of the parameter list. For the existing CUDA Runtime,
> this gets ignored, but the OpenCL Runtime knows to check for
> kernel-argument size at the end of the parameter list. (The resulting
> parameters list is twice as long. This has been accounted for in the
> corresponding test cases).
> >
> > Reviewers: grosser, Meinersbur, bollu
> >
> > Reviewed By: bollu
> >
> > Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits
> >
> > Tags: #polly
> >
> > Differential Revision: https://reviews.llvm.org/D32961
> >
> > Modified:
> >     polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp
> >     polly/trunk/test/GPGPU/cuda-managed-memory-simple.ll
> >     polly/trunk/test/GPGPU/host-control-flow.ll
> >     polly/trunk/test/GPGPU/kernel-params-only-some-arrays.ll
> >     polly/trunk/test/GPGPU/parametric-loop-bound.ll
> >     polly/trunk/tools/GPURuntime/GPUJIT.c
> >
> > Modified: polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp (original)
> > +++ polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp Tue May  9 05:45:52
> 2017
> > @@ -142,6 +142,14 @@ static __isl_give isl_id_to_ast_expr *po
> >    return RefToExpr;
> >  }
> >
> > +/// Given a LLVM Type, compute its size in bytes,
> > +static int computeSizeInBytes(const Type *T) {
> > +  int bytes = T->getPrimitiveSizeInBits() / 8;
> > +  if (bytes == 0)
> > +    bytes = T->getScalarSizeInBits() / 8;
> > +  return bytes;
> > +}
> > +
> >  /// Generate code for a GPU specific isl AST.
> >  ///
> >  /// The GPUNodeBuilder augments the general existing IslNodeBuilder,
> which
> > @@ -272,6 +280,16 @@ private:
> >    /// @returns A tuple with thread block sizes for X, Y, and Z
> dimensions.
> >    std::tuple<Value *, Value *, Value *> getBlockSizes(ppcg_kernel
> *Kernel);
> >
> > +  /// Store a specific kernel launch parameter in the array of kernel
> launch
> > +  /// parameters.
> > +  ///
> > +  /// @param Parameters The list of parameters in which to store.
> > +  /// @param Param      The kernel launch parameter to store.
> > +  /// @param Index      The index in the parameter list, at which to
> store the
> > +  ///                   parameter.
> > +  void insertStoreParameter(Instruction *Parameters, Instruction *Param,
> > +                            int Index);
> > +
> >    /// Create kernel launch parameters.
> >    ///
> >    /// @param Kernel        The kernel to create parameters for.
> > @@ -1192,11 +1210,21 @@ GPUNodeBuilder::getBlockSizes(ppcg_kerne
> >    return std::make_tuple(Sizes[0], Sizes[1], Sizes[2]);
> >  }
> >
> > +void GPUNodeBuilder::insertStoreParameter(Instruction *Parameters,
> > +                                          Instruction *Param, int
> Index) {
> > +  Value *Slot = Builder.CreateGEP(
> > +      Parameters, {Builder.getInt64(0), Builder.getInt64(Index)});
> > +  Value *ParamTyped = Builder.CreatePointerCast(Param,
> Builder.getInt8PtrTy());
> > +  Builder.CreateStore(ParamTyped, Slot);
> > +}
> > +
> >  Value *
> >  GPUNodeBuilder::createLaunchParameters(ppcg_kernel *Kernel, Function *F,
> >                                         SetVector<Value *>
> SubtreeValues) {
> > -  Type *ArrayTy = ArrayType::get(Builder.getInt8PtrTy(),
> > -                                 std::distance(F->arg_begin(),
> F->arg_end()));
> > +  const int NumArgs = F->arg_size();
> > +  std::vector<int> ArgSizes(NumArgs);
> > +
> > +  Type *ArrayTy = ArrayType::get(Builder.getInt8PtrTy(), 2 * NumArgs);
> >
> >    BasicBlock *EntryBlock =
> >        &Builder.GetInsertBlock()->getParent()->getEntryBlock();
> > @@ -1213,6 +1241,8 @@ GPUNodeBuilder::createLaunchParameters(p
> >      isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space,
> isl_dim_set);
> >      const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(Id);
> >
> > +    ArgSizes[Index] = SAI->getElemSizeInBytes();
> > +
> >      Value *DevArray = nullptr;
> >      if (ManagedMemory) {
> >        DevArray = getOrCreateManagedDeviceArray(
> > @@ -1265,16 +1295,15 @@ GPUNodeBuilder::createLaunchParameters(p
> >      isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_set, i);
> >      Value *Val = IDToValue[Id];
> >      isl_id_free(Id);
> > +
> > +    ArgSizes[Index] = computeSizeInBytes(Val->getType());
> > +
> >      Instruction *Param =
> >          new AllocaInst(Val->getType(), AddressSpace,
> >                         Launch + "_param_" + std::to_string(Index),
> >                         EntryBlock->getTerminator());
> >      Builder.CreateStore(Val, Param);
> > -    Value *Slot = Builder.CreateGEP(
> > -        Parameters, {Builder.getInt64(0), Builder.getInt64(Index)});
> > -    Value *ParamTyped =
> > -        Builder.CreatePointerCast(Param, Builder.getInt8PtrTy());
> > -    Builder.CreateStore(ParamTyped, Slot);
> > +    insertStoreParameter(Parameters, Param, Index);
> >      Index++;
> >    }
> >
> > @@ -1284,30 +1313,38 @@ GPUNodeBuilder::createLaunchParameters(p
> >      isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_param, i);
> >      Value *Val = IDToValue[Id];
> >      isl_id_free(Id);
> > +
> > +    ArgSizes[Index] = computeSizeInBytes(Val->getType());
> > +
> >      Instruction *Param =
> >          new AllocaInst(Val->getType(), AddressSpace,
> >                         Launch + "_param_" + std::to_string(Index),
> >                         EntryBlock->getTerminator());
> >      Builder.CreateStore(Val, Param);
> > -    Value *Slot = Builder.CreateGEP(
> > -        Parameters, {Builder.getInt64(0), Builder.getInt64(Index)});
> > -    Value *ParamTyped =
> > -        Builder.CreatePointerCast(Param, Builder.getInt8PtrTy());
> > -    Builder.CreateStore(ParamTyped, Slot);
> > +    insertStoreParameter(Parameters, Param, Index);
> >      Index++;
> >    }
> >
> >    for (auto Val : SubtreeValues) {
> > +    ArgSizes[Index] = computeSizeInBytes(Val->getType());
> > +
> >      Instruction *Param =
> >          new AllocaInst(Val->getType(), AddressSpace,
> >                         Launch + "_param_" + std::to_string(Index),
> >                         EntryBlock->getTerminator());
> >      Builder.CreateStore(Val, Param);
> > -    Value *Slot = Builder.CreateGEP(
> > -        Parameters, {Builder.getInt64(0), Builder.getInt64(Index)});
> > -    Value *ParamTyped =
> > -        Builder.CreatePointerCast(Param, Builder.getInt8PtrTy());
> > -    Builder.CreateStore(ParamTyped, Slot);
> > +    insertStoreParameter(Parameters, Param, Index);
> > +    Index++;
> > +  }
> > +
> > +  for (int i = 0; i < NumArgs; i++) {
> > +    Value *Val = ConstantInt::get(Builder.getInt32Ty(), ArgSizes[i]);
> > +    Instruction *Param =
> > +        new AllocaInst(Builder.getInt32Ty(), AddressSpace,
> > +                       Launch + "_param_size_" + std::to_string(i),
> > +                       EntryBlock->getTerminator());
> > +    Builder.CreateStore(Val, Param);
> > +    insertStoreParameter(Parameters, Param, Index);
> >      Index++;
> >    }
> >
> >
> > Modified: polly/trunk/test/GPGPU/cuda-managed-memory-simple.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/test/GPGPU/cuda-managed-memory-simple.ll?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/test/GPGPU/cuda-managed-memory-simple.ll (original)
> > +++ polly/trunk/test/GPGPU/cuda-managed-memory-simple.ll Tue May  9
> 05:45:52 2017
> > @@ -37,18 +37,26 @@
> >
> >  ; CHECK:       %13 = call i8* @polly_initContextCUDA()
> >  ; CHECK-NEXT:  %14 = bitcast i32* %A to i8*
> > -; CHECK-NEXT:  %15 = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_0_params, i64 0, i64 0
> > +; CHECK-NEXT:  %15 = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 0
> >  ; CHECK-NEXT:  store i8* %14, i8** %polly_launch_0_param_0
> >  ; CHECK-NEXT:  %16 = bitcast i8** %polly_launch_0_param_0 to i8*
> >  ; CHECK-NEXT:  store i8* %16, i8** %15
> >  ; CHECK-NEXT:  %17 = bitcast i32* %R to i8*
> > -; CHECK-NEXT:  %18 = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> > +; CHECK-NEXT:  %18 = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> >  ; CHECK-NEXT:  store i8* %17, i8** %polly_launch_0_param_1
> >  ; CHECK-NEXT:  %19 = bitcast i8** %polly_launch_0_param_1 to i8*
> >  ; CHECK-NEXT:  store i8* %19, i8** %18
> > -; CHECK-NEXT:  %20 = call i8* @polly_getKernel(i8* getelementptr
> inbounds ([750 x i8], [750 x i8]* @kernel_0, i32 0, i32 0), i8*
> getelementptr inbounds ([9 x i8], [9 x i8]* @kernel_0_name, i32 0, i32 0))
> > -; CHECK-NEXT:  call void @polly_launchKernel(i8* %20, i32 2, i32 1, i32
> 32, i32 1, i32 1, i8* %polly_launch_0_params_i8ptr)
> > -; CHECK-NEXT:  call void @polly_freeKernel(i8* %20)
> > +; CHECK-NEXT:  store i32 4, i32* %polly_launch_0_param_size_0
> > +; CHECK-NEXT:  %20 = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 2
> > +; CHECK-NEXT:  %21 = bitcast i32* %polly_launch_0_param_size_0 to i8*
> > +; CHECK-NEXT:  store i8* %21, i8** %20
> > +; CHECK-NEXT:  store i32 4, i32* %polly_launch_0_param_size_1
> > +; CHECK-NEXT:  %22 = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 3
> > +; CHECK-NEXT:  %23 = bitcast i32* %polly_launch_0_param_size_1 to i8*
> > +; CHECK-NEXT:  store i8* %23, i8** %22
> > +; CHECK-NEXT:  %24 = call i8* @polly_getKernel(i8* getelementptr
> inbounds ([750 x i8], [750 x i8]* @kernel_0, i32 0, i32 0), i8*
> getelementptr inbounds ([9 x i8], [9 x i8]* @kernel_0_name, i32 0, i32 0))
> > +; CHECK-NEXT:  call void @polly_launchKernel(i8* %24, i32 2, i32 1, i32
> 32, i32 1, i32 1, i8* %polly_launch_0_params_i8ptr)
> > +; CHECK-NEXT:  call void @polly_freeKernel(i8* %24)
> >  ; CHECK-NEXT:  call void @polly_synchronizeDevice()
> >  ; CHECK-NEXT:  call void @polly_freeContext(i8* %13)
> >
> >
> > Modified: polly/trunk/test/GPGPU/host-control-flow.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/test/GPGPU/host-control-flow.ll?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/test/GPGPU/host-control-flow.ll (original)
> > +++ polly/trunk/test/GPGPU/host-control-flow.ll Tue May  9 05:45:52 2017
> > @@ -32,7 +32,7 @@
> >  ; IR-NEXT:   %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [
> %polly.indvar_next, %polly.loop_header ]
> >  ; ...
> >  ; IR:  store i64 %polly.indvar, i64* %polly_launch_0_param_1
> > -; IR-NEXT:  [[REGA:%.+]] = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> > +; IR-NEXT:  [[REGA:%.+]] = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> >  ; IR-NEXT:  [[REGB:%.+]] = bitcast i64* %polly_launch_0_param_1 to i8*
> >  ; IR-NEXT:  store i8* [[REGB]], i8** [[REGA]]
> >  ; IR: call i8* @polly_getKernel
> >
> > Modified: polly/trunk/test/GPGPU/kernel-params-only-some-arrays.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/test/GPGPU/kernel-params-only-some-arrays.ll?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/test/GPGPU/kernel-params-only-some-arrays.ll (original)
> > +++ polly/trunk/test/GPGPU/kernel-params-only-some-arrays.ll Tue May  9
> 05:45:52 2017
> > @@ -48,13 +48,13 @@
> >
> >
> >  ; IR:       [[DEVPTR:%.*]] = call i8* @polly_getDevicePtr(i8*
> %p_dev_array_MemRef_A)
> > -; IR-NEXT:  [[SLOT:%.*]] = getelementptr [1 x i8*], [1 x i8*]*
> %polly_launch_0_params, i64 0, i64 0
> > +; IR-NEXT:  [[SLOT:%.*]] = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_0_params, i64 0, i64 0
> >  ; IR-NEXT:  store i8* [[DEVPTR]], i8** %polly_launch_0_param_0
> >  ; IR-NEXT:  [[DATA:%.*]] = bitcast i8** %polly_launch_0_param_0 to i8*
> >  ; IR-NEXT:  store i8* [[DATA]], i8** [[SLOT]]
> >
> >  ; IR:       [[DEVPTR:%.*]] = call i8* @polly_getDevicePtr(i8*
> %p_dev_array_MemRef_B)
> > -; IR-NEXT:  [[SLOT:%.*]] = getelementptr [1 x i8*], [1 x i8*]*
> %polly_launch_1_params, i64 0, i64 0
> > +; IR-NEXT:  [[SLOT:%.*]] = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_1_params, i64 0, i64 0
> >  ; IR-NEXT:  store i8* [[DEVPTR]], i8** %polly_launch_1_param_0
> >  ; IR-NEXT:  [[DATA:%.*]] = bitcast i8** %polly_launch_1_param_0 to i8*
> >  ; IR-NEXT:  store i8* [[DATA]], i8** [[SLOT]]
> >
> > Modified: polly/trunk/test/GPGPU/parametric-loop-bound.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/test/GPGPU/parametric-loop-bound.ll?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/test/GPGPU/parametric-loop-bound.ll (original)
> > +++ polly/trunk/test/GPGPU/parametric-loop-bound.ll Tue May  9 05:45:52
> 2017
> > @@ -31,7 +31,7 @@
> >  ; CODE-NEXT:     Stmt_bb2(32 * b0 + t0 + 1048576 * c0);
> >
> >  ; IR: store i64 %n, i64* %polly_launch_0_param_1
> > -; IR-NEXT: [[REGA:%.+]] = getelementptr [2 x i8*], [2 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> > +; IR-NEXT: [[REGA:%.+]] = getelementptr [4 x i8*], [4 x i8*]*
> %polly_launch_0_params, i64 0, i64 1
> >  ; IR-NEXT: [[REGB:%.+]] = bitcast i64* %polly_launch_0_param_1 to i8*
> >  ; IR-NEXT: store i8* [[REGB]], i8** [[REGA]]
> >
> >
> > Modified: polly/trunk/tools/GPURuntime/GPUJIT.c
> > URL:
> http://llvm.org/viewvc/llvm-project/polly/trunk/tools/GPURuntime/GPUJIT.c?rev=302515&r1=302514&r2=302515&view=diff
> >
> ==============================================================================
> > --- polly/trunk/tools/GPURuntime/GPUJIT.c (original)
> > +++ polly/trunk/tools/GPURuntime/GPUJIT.c Tue May  9 05:45:52 2017
> > @@ -554,28 +554,12 @@ static void launchKernelCL(PollyGPUFunct
> >                                sizeof(cl_uint), &NumArgs, NULL);
> >    checkOpenCLError(Ret, "Failed to get number of kernel arguments.\n");
> >
> > -  // TODO: Pass the size of the kernel arguments in to launchKernelCL,
> along
> > -  // with the arguments themselves. This is a dirty workaround that can
> be
> > -  // broken.
> > +  /* Argument sizes are stored at the end of the Parameters array. */
> >    for (cl_uint i = 0; i < NumArgs; i++) {
> > -    Ret = clSetKernelArgFcnPtr(CLKernel->Kernel, i, 8, (void
> *)Parameters[i]);
> > -    if (Ret == CL_INVALID_ARG_SIZE) {
> > -      Ret = clSetKernelArgFcnPtr(CLKernel->Kernel, i, 4, (void
> *)Parameters[i]);
> > -      if (Ret == CL_INVALID_ARG_SIZE) {
> > -        Ret =
> > -            clSetKernelArgFcnPtr(CLKernel->Kernel, i, 2, (void
> *)Parameters[i]);
> > -        if (Ret == CL_INVALID_ARG_SIZE) {
> > -          Ret = clSetKernelArgFcnPtr(CLKernel->Kernel, i, 1,
> > -                                     (void *)Parameters[i]);
> > -          checkOpenCLError(Ret, "Failed to set Kernel argument %d.\n",
> i);
> > -        }
> > -      }
> > -    }
> > -    if (Ret != CL_SUCCESS && Ret != CL_INVALID_ARG_SIZE) {
> > -      fprintf(stderr, "Failed to set Kernel argument.\n");
> > -      printOpenCLError(Ret);
> > -      exit(-1);
> > -    }
> > +    Ret = clSetKernelArgFcnPtr(CLKernel->Kernel, i,
> > +                               *((int *)Parameters[NumArgs + i]),
> > +                               (void *)Parameters[i]);
> > +    checkOpenCLError(Ret, "Failed to set Kernel argument %d.\n", i);
> >    }
> >
> >    unsigned int GridDimZ = 1;
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-- 
Sending this from my phone, please excuse any typos!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170509/e2101631/attachment.html>


More information about the llvm-commits mailing list