[PATCH 4/6] R600: Add zero undef variants of ctlz/cttz tests.

Sat Jun 14 14:10:19 PDT 2014

On Fri, 2014-06-13 at 11:07 -0700, Matt Arsenault wrote:
> On 06/13/2014 08:45 AM, Jan Vesely wrote:
> > On Fri, 2014-06-13 at 11:24 -0400, Jan Vesely wrote:
> >> On Thu, 2014-06-12 at 12:52 -0700, Matt Arsenault wrote:
> >>> On Jun 12, 2014, at 12:41 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> >>>

SNIP

> >>>
> >>> Is it really correct to use bcnt for this? I was working on matching
> >>> the undef versions a while ago and used FFBL / FFBH instructions,
> >>> although I haven’t tried running these yet
> >> You are right. I didn't check whether there's a better instruction for
> >> these.
> >>
> >> I got ffbh/ffbl running on my TURKS card, but I'm unsure about
> >> SI.
> > The confusing part is the use of S_ instruction. With your patches I
> > see:
> >   S_LOAD_DWORD
> >   S_FLBIT_I32_B32
> >   V_MOV_B32_e32
> >   BUFFER_STORE_DWORD
> >
> > how is it different from:
> >   S_LOAD_DWORD s0
> >   V_FFBH_U32_e32 v0, s0
> >   BUFFER_STORE_DWORD v0, ...
> >
> > other than executing the computation on every work-item. is there
> > power/performance difference?
> 
> Theoretically using the SALU instructions is faster and uses less power, 
> as well as saves VGPRs. In general it should be better to keep anything 
> on the SALU whenever possible, but I don't know the details of how SALU 
> instructions are executed or how helpful it is (other than helping with 
> register usage)

aha, so the idea is that everything is first generated for SALU, and
code that needs to run on every work-item is converted to VALU ops?
is that why ctlz_zero_undef matches S_FLBIT, but not V_FFBH?

> 
> It would be useful to have tests that actually execute both variants in 
> piglit since it's highly likely I got these backwards (I swapped them at 
> one point). It's also confusing because the names are different between 
> the S and V versions.

AFAICT the two patches look good in this regard, S_FLBIT starts from MSB
and matches ctlz (and vice versa with S_FFI). Not sure if i can give
full RB, since I don't have SI hw or complete understanding of the
SALU/VALU transformation, but the patches look good to me.
If you plan to push those patches I can rebase on top of them and add
support for pre-SI GPUs and i64.

> 
> As a side note, I have been using a global load from a pointer argument 
> as a way to enforce using VALU instructions in tests, but that this 
> works is a missing optimization. The pointer needs to be dynamically 
> indexed into by a VGPR, because loading a constant offset from a kernel 
> argument pointer could be optimized into an s_load

speaking about optimization, how does SALU op + V_MOV compare to
equivalent VALU op?

> 
> >> Should I keep the +var +imm +inv_var tests, since ffbh/l don't have the
> >> extra argument?
> I'm not exactly sure which tests you mean
> 

I meant the patch 4/6 that copied test cases from ctpop, but your tests
look better, so this point is moot.

regards,
Jan

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140614/3a55fd3b/attachment.sig>