[PATCH] D29506: [OpenMP] Teams reduction on the NVPTX device.
Arpith Jacob via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Feb 9 06:42:52 PST 2017
arpith-jacob marked 2 inline comments as done.
arpith-jacob added a comment.
In https://reviews.llvm.org/D29506#669542, @ABataev wrote:
> The patch is too big and quite hard to review? Could you split it into several smaller parts?
Alexey, thank you for your time. I have addressed your comments and split the patch into a smaller one, which I will post shortly.
The new patch is to implement parallel reductions on the GPU. This is the smallest patch I can get such that the codegen is correct and fully functional (runs correctly on the GPU).
Thanks,
Arpith
================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:4280-4282
+ // We don't need debug information in this function as nothing here refers to
+ // user source code.
+ CGF.disableDebugInfo();
----------------
ABataev wrote:
> It is not quite so, at least we have a reference in a list of reductions in `reduction` clause, which may be considered a debug position
Ok, removed.
================
Comment at: lib/CodeGen/CGOpenMPRuntime.h:524
+
+ static bool classof(const CGOpenMPRuntime *RT) {
+ return RT->getKind() == RK_HOST;
----------------
ABataev wrote:
> arpith-jacob wrote:
> > This is required to cast to the NVPTX runtime in a static function as follows;
> >
> > CGOpenMPRuntimeNVPTX &RT = cast<CGOpenMPRuntimeNVPTX>(CGM.getOpenMPRuntime());
> Are you going to make calls to `isa()`, `dyn_cast()` functions? If not, just use `static_cast<>()` instead.
Yes, I will use a static_cast and remove the OpenMPRuntimeKind functionality.
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:972-973
+ auto CastTy = Size <= 4 ? CGM.Int32Ty : CGM.Int64Ty;
+ auto *ElemCast = Bld.CreateSExtOrBitCast(Elem, CastTy);
+ auto *WarpSize = Bld.CreateTruncOrBitCast(getNVPTXWarpSize(CGF), CGM.Int16Ty);
+
----------------
ABataev wrote:
> I'd prefer you to use `CGF.EmitScalarConversion()` rather than Builder casts.
Ok. I have used CGF.EmitScalarConversion() when I can, for example for WarpSize. In other cases I want simple bitcasts and truncs so I have used the Builder.
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:975-978
+ llvm::SmallVector<llvm::Value *, 3> Args;
+ Args.push_back(ElemCast);
+ Args.push_back(Offset);
+ Args.push_back(WarpSize);
----------------
ABataev wrote:
> Do you really need a SmallVector<> or you can use just an array here?
Used an array here and other places.
================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:2249
+ ? OMPD_parallel_for_simd
+ : OMPD_parallel);
// Emit post-update of the reduction variables if IsLastIter != 0.
----------------
ABataev wrote:
> OMPD_parallel or OMPD_parallel_for?
OMPD_parallel is fine. I just need to know that it is a 'parallel' type reduction.
================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:2424
CGF.OMPCancelStack.emitExit(CGF, S.getDirectiveKind(), CodeGen);
- CGF.EmitOMPReductionClauseFinal(S);
+ CGF.EmitOMPReductionClauseFinal(S, OMPD_parallel);
// Emit post-update of the reduction variables if IsLastIter != 0.
----------------
ABataev wrote:
> OMPD_parallel_for?
This is the reduction codegen for the 'sections' directive. So a reduction type of 'OMPD_parallel' works well.
https://reviews.llvm.org/D29506
More information about the cfe-commits
mailing list