[Libclc-dev] vload problems (was: Re: [PATCH 1/2] amdgcn: Implement {read_, write_, }mem_fence builtin)
Aaron Watry via Libclc-dev
libclc-dev at lists.llvm.org
Fri Aug 18 09:50:53 PDT 2017
On Thu, Aug 17, 2017, 11:43 PM Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Thu, 2017-08-17 at 21:53 -0500, Aaron Watry wrote:
> > On Wed, Aug 16, 2017 at 6:21 PM, Jan Vesely <jan.vesely at rutgers.edu>
> wrote:
> > > On Sat, 2017-08-12 at 13:18 -0500, Aaron Watry wrote:
> > > > On Fri, Aug 11, 2017 at 8:59 PM, Jan Vesely via Libclc-dev
> > > > <libclc-dev at lists.llvm.org> wrote:
> > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > > > > ---
> > > > >
> > > > > __builtin_amdgcn_s_waitcnt path is compile tested only. I currently
> > > > > don't have machine with GCN hw and LLVM > 4
> > > >
> > > > Tested on GCN 1.0 (PITCAIRN) with LLVM 6.0 svn (from sometime last
> week).
> > > >
> > > > I ran the local-memory/global-memory piglit tests, and the
> conformance
> > > > basic/test_basic vload*, vstore*, and barrier tests. No change in
> > > > pass-rate.
> > > >
> > > > The only ones that fail are vload_private with
> char/uchar/short/ushort
> > > > data types, but those failed before your series.
> > >
> > > do you remember if those failures are new in LLVM 6? I just posted new
> > > vload/vstore piglit tests, and those ran OK* on carrizo/iceland system
> > > using LLVM 5.
> > >
> > > *mostly. vload_half is missing from libclc to those test failed.
> > > Unlike Turks, which fails ~40% of them (LLVM 6).
> >
> > I don't recall running this test specifically before the 5.0rc
> > branch-point, so I don't know if this is a new failure. All of the
> > vload tests for local/global/constant seem ok, and vload private is
> > working for anything that has a 32-bit gentype or larger
> > (int/long/float/double). It's just the vloads of
> > char/uchar/short/ushort that are failing. I've been looking at the
> > vload code, and at least the CL C code we've got looks correct.
> >
> > The CTS test allows you to tweak the number of data points that are
> > tested in individual threads, and curiously, when the global size goes
> > from 15 to 16, things start failing (at least for the char2 test).
>
> If it's intermittent I'd suspect buffer manipulation rather than GPU
> execution, but I might be wrong.
> I mostly wanted to find out whether the new piglit tests hit the issue,
> or the situation is more complicated.
> Modified vstore tests managed to uncover fail in vstore-{u,}char-
> private, but that's assertion failure, which I assume is different from
> what you're seeing in CTS.
>
Good point about the assertions, I don't know that I'm actually running a
debug+assertions build right now. That being said the failures aren't
transient. The piglit test I sent consistently passes and the cts one
passes when working on ≤ 15 threads. It always fails at or over 16
threads... I'll see if I can track it down at some point. With
libclc/clover there's enough bugs and missing features to chase down that I
might just go for some other low-hanging fruit first.
Unfortunately I didn't get a chance to run those new tests last night. I'll
let you know how it goes when I get a chance to try them.
--Aaron
> http://paul.rutgers.edu/~jv356/piglit/gcn-latest-3/problems.html
>
> >
> > I've diffed the clover_dump.cl and .ll files for both, and they're
> > identical. I copied all of the test data into a piglit test, and the
> > same CL code and input/output data passes in piglit... The piglit CL
> > and .ll files are identical to what is executed by the CTS other than
> > the giant piglit comment section at the top related to buffer/test
> > setup. Possibly a difference in API usage, or buffer alignments, or
> > something.
> >
> > I'm not entirely sure that the vload code is to blame here, but it's
> > possible. It's working just fine for global/local/constant vload
> > (which doesn't rule out any weirdness with private memory in GCN
> > having possible alignment/register-size restrictions that I'm not
> > aware of).
>
> VI+ GCN parts support i16 instructions, and there was a similar bug
> before: https://reviews.llvm.org/D30281
>
> > I think I'm going to spend a little more time working on
> > making sure that the create/write/read buffer pieces in clover (and
> > all of our synchronization pieces) are solid before I spend too much
> > more time here.
>
> thanks. clover on GCN is not really high on my list. my gcn machine is
> setup with rocm.
> Moreover, upgrading to LLVM 5 regressed both pyrite (can't select
> device) and bfgminer (flood of acpi errors in dmesg).
>
> >
> > Ooh, fun fact... I just tried to run the test on my Barts (6850/NI),
> > and it hung the machine on the first test. At least my GCN just fails
> > the test...
>
> this looks related to the ongoing clpeak reported bug, the kernel
> driver is hit or miss when it comes to recovering hung GPU.
>
> >
> > Anyway, I'll be out of communication for a few days. New kid arriving
> > tomorrow, so I'll be buried in parents/in-laws and lacking in sleep.
>
> congratz, and no worries, there's no rush when it comes to clover :)
>
> >
> > If you want to take a look at the .cl/ll files I used from piglit, I'm
> > attaching them.
>
> thanks,
> Jan
>
> >
> > --Aaron
> >
> > >
> > > regards,
> > > Jan
> > >
> > > [SNIP]
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170818/55802117/attachment.html>
More information about the Libclc-dev
mailing list