[Openmp-dev] Is libomp open to add arch-specific barrier implementation?

misono.tomohiro@fujitsu.com via Openmp-dev openmp-dev at lists.llvm.org
Mon Nov 1 19:56:10 PDT 2021


Hi, Thanks for quick response. I will try to write acceptable code.
Also, If anyone has any comments on overall approach, I'd like to hear it.

Best Regards,
Tomohiro

> Hi, libomp is OpenSource and definitely you can add/tweak/improve/fix it’s
> functionality. Just read the documentation about llvm development process
> and send patches.
> 
> Best regards,
> Alexey Bataev
> 
> > 1 нояб. 2021 г., в 22:14, misono.tomohiro at fujitsu.com via Openmp-dev
> <openmp-dev at lists.llvm.org> написал(а):
> >
> > Hello, list.
> >
> > I'm Tomohiro Misono and software engineer at Fujitsu.
> >
> > I'm new to libomp community and I'd like to ask a question about libomp's
> development policy today.
> > In short, as the title says, is libomp open to add arch(CPU)-specific barrier
> implementation?
> > I have Fujitsu A64FX's hardware assisted barrier implementation in mind.
> >
> > Below is more detailed background.
> >
> > A64FX processor[*] (which is for HPC and used in supercomputer Fugaku)
> has hardware assisted
> > barrier using architecture specific registers. This mechanism can be used to
> make a synchronization
> > within L2-share domains using these registers. Although Fujitsu has its own
> openmp runtime library
> > implementation to support this barrier, we are now considering if it is
> possible to support it in open
> > library (i.e. libomp) too. Based on my research, I think it would be possible to
> support the barrier in
> > libomp by adding a new barrier type which only works for specific
> architecture, but is this approach ok
> > for the community?
> >
> > [*] Specifications: https://github.com/fujitsu/A64FX
> >
> > Note that the code we have at this point is not easily incorporated into libomp
> and totally new
> > development is required from scratch. Also, it requires kernel driver to be
> loaded to access the
> > registers (please see below). I just want to know if this plan is feasible in the
> first place before
> > starting development.
> >
> > Some notes for possible implementation:
> > - A64FX's hardware barrier can perform synchronization within L2-share
> domains. Therefore
> >  conventional barrier by software (i.e. flag Class) is still needed for cross-L2
> domain synchronization.
> >  So, the possible implementation would have some similarity in hierarchical
> barrier (only leaf can
> >  use hardware barrier).  I think expanding current hierarchical barrier code
> becomes messy and
> >  introducing a new barrier type is better
> > - In the optimal case (i.e. barrier within L2 domain), there is no need to use
> software barrier at all.
> >  Currently task execution is mainly coupled with flag Class and this needs to
> be addressed somehow
> > - In order to use hardware barrier, each thread must be bound to its specific
> core and cannot be
> >  moved. If the condition does not meet, the library has to fallback to use
> software barrier.
> >  I think this restriction implies hardware barrier cannot be used at
> fork_barrier.
> > - Last but not least; In order to access the barrier registers on A64FX, linux
> kernel driver is needed.
> >  We are willing to open the driver code too (but it is not accepted linux kernel
> community at this point).
> >  The ultimate goal is determining user-kernel interface as general as
> possible so that code can be
> >  reused for both libomp and kernel driver if other new hardware assisted
> barrier implementation emerges,
> >  but this is a challenging problem.
> >
> > I'd appreciate any comments.
> >
> > Regards,
> > Tomohiro
> > _______________________________________________
> > Openmp-dev mailing list
> > Openmp-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list