[Libclc-dev] vload problems (was: Re: [PATCH 1/2] amdgcn: Implement {read_, write_, }mem_fence builtin)

Aaron Watry via Libclc-dev libclc-dev at lists.llvm.org
Thu Aug 17 19:53:14 PDT 2017

On Wed, Aug 16, 2017 at 6:21 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Sat, 2017-08-12 at 13:18 -0500, Aaron Watry wrote:
>> On Fri, Aug 11, 2017 at 8:59 PM, Jan Vesely via Libclc-dev
>> <libclc-dev at lists.llvm.org> wrote:
>> > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
>> > ---
>> >
>> > __builtin_amdgcn_s_waitcnt path is compile tested only. I currently
>> > don't have machine with GCN hw and LLVM > 4
>> Tested on GCN 1.0 (PITCAIRN) with LLVM 6.0 svn (from sometime last week).
>> I ran the local-memory/global-memory piglit tests, and the conformance
>> basic/test_basic vload*, vstore*, and barrier tests. No change in
>> pass-rate.
>> The only ones that fail are vload_private with char/uchar/short/ushort
>> data types, but those failed before your series.
> do you remember if those failures are new in LLVM 6? I just posted new
> vload/vstore piglit tests, and those ran OK* on carrizo/iceland system
> using LLVM 5.
> *mostly. vload_half is missing from libclc to those test failed.
> Unlike Turks, which fails ~40% of them (LLVM 6).

I don't recall running this test specifically before the 5.0rc
branch-point, so I don't know if this is a new failure.  All of the
vload tests for local/global/constant seem ok, and vload private is
working for anything that has a 32-bit gentype or larger
(int/long/float/double). It's just the vloads of
char/uchar/short/ushort that are failing.  I've been looking at the
vload code, and at least the CL C code we've got looks correct.

The CTS test allows you to tweak the number of data points that are
tested in individual threads, and curiously, when the global size goes
from 15 to 16, things start failing (at least for the char2 test).

I've diffed the clover_dump.cl and .ll files for both, and they're
identical.  I copied all of the test data into a piglit test, and the
same CL code and input/output data passes in piglit... The piglit CL
and .ll files are identical to what is executed by the CTS other than
the giant piglit comment section at the top related to buffer/test
setup.  Possibly a difference in API usage, or buffer alignments, or

I'm not entirely sure that the vload code is to blame here, but it's
possible. It's working just fine for global/local/constant vload
(which doesn't rule out any weirdness with private memory in GCN
having possible alignment/register-size restrictions that I'm not
aware of).  I think I'm going to spend a little more time working on
making sure that the create/write/read buffer pieces in clover (and
all of our synchronization pieces) are solid before I spend too much
more time here.

Ooh, fun fact...  I just tried to run the test on my Barts (6850/NI),
and it hung the machine on the first test.  At least my GCN just fails
the test...

Anyway, I'll be out of communication for a few days.  New kid arriving
tomorrow, so I'll be buried in parents/in-laws and lacking in sleep.

If you want to take a look at the .cl/ll files I used from piglit, I'm
attaching them.


> regards,
> Jan
> [SNIP]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clover_dump.cl
Type: application/octet-stream
Size: 2145 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170817/33e6b5d0/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clover_dump.link-0.ll
Type: application/octet-stream
Size: 6449 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170817/33e6b5d0/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clover_dump.ll
Type: application/octet-stream
Size: 7141 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170817/33e6b5d0/attachment-0005.obj>

More information about the Libclc-dev mailing list