[Libclc-dev] [PATCH 2/3] amdgcn-amdhsa: Add get_num_groups implementation
Tom Stellard via Libclc-dev
libclc-dev at lists.llvm.org
Fri Sep 16 15:33:29 PDT 2016
On Wed, Sep 14, 2016 at 11:18:12AM -0400, Jan Vesely via Libclc-dev wrote:
> On Wed, 2016-09-14 at 10:58 -0400, Matt Arsenault via Libclc-dev wrote:
> > >
> > > On Sep 14, 2016, at 07:22, Tom Stellard via Libclc-dev <libclc-dev@
> > > lists.llvm.org> wrote:
> > >
> > > +_CLC_DEF size_t get_num_groups(uint dim) {
> > > + size_t global_size = get_global_size(dim);
> > > + size_t local_size = get_local_size(dim);
> > > + size_t num_groups = global_size / local_size;
> > > + if (global_size % local_size != 0) {
> > > + num_groups++;
> > > + }
> > > + return num_groups;
> > LGTM. I hope the % does get optimized out?
>
> AMDGPU implements DIVREM for 64bit integers, so / and % are computed at
> the same time. It's still rather expensive (~400 instructions).
>
> (global_size + local_size -1) / local_size
>
> allows elimination of REM only parts of DIVREM (although the savings
> are negligible).
>
I would really prefer to have the runtime compute this, so I think we
can replace this with something better in the future.
-Tom
> Jan
>
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> --
> Jan Vesely <jv356 at scarletmail.rutgers.edu>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list