[PATCH 4/6] R600: Add zero undef variants of ctlz/cttz tests.
jan.vesely at rutgers.edu
Sat Jun 14 14:10:19 PDT 2014
On Fri, 2014-06-13 at 11:07 -0700, Matt Arsenault wrote:
> On 06/13/2014 08:45 AM, Jan Vesely wrote:
> > On Fri, 2014-06-13 at 11:24 -0400, Jan Vesely wrote:
> >> On Thu, 2014-06-12 at 12:52 -0700, Matt Arsenault wrote:
> >>> On Jun 12, 2014, at 12:41 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> >>> Is it really correct to use bcnt for this? I was working on matching
> >>> the undef versions a while ago and used FFBL / FFBH instructions,
> >>> although I haven’t tried running these yet
> >> You are right. I didn't check whether there's a better instruction for
> >> these.
> >> I got ffbh/ffbl running on my TURKS card, but I'm unsure about
> >> SI.
> > The confusing part is the use of S_ instruction. With your patches I
> > see:
> > S_LOAD_DWORD
> > S_FLBIT_I32_B32
> > V_MOV_B32_e32
> > BUFFER_STORE_DWORD
> > how is it different from:
> > S_LOAD_DWORD s0
> > V_FFBH_U32_e32 v0, s0
> > BUFFER_STORE_DWORD v0, ...
> > other than executing the computation on every work-item. is there
> > power/performance difference?
> Theoretically using the SALU instructions is faster and uses less power,
> as well as saves VGPRs. In general it should be better to keep anything
> on the SALU whenever possible, but I don't know the details of how SALU
> instructions are executed or how helpful it is (other than helping with
> register usage)
aha, so the idea is that everything is first generated for SALU, and
code that needs to run on every work-item is converted to VALU ops?
is that why ctlz_zero_undef matches S_FLBIT, but not V_FFBH?
> It would be useful to have tests that actually execute both variants in
> piglit since it's highly likely I got these backwards (I swapped them at
> one point). It's also confusing because the names are different between
> the S and V versions.
AFAICT the two patches look good in this regard, S_FLBIT starts from MSB
and matches ctlz (and vice versa with S_FFI). Not sure if i can give
full RB, since I don't have SI hw or complete understanding of the
SALU/VALU transformation, but the patches look good to me.
If you plan to push those patches I can rebase on top of them and add
support for pre-SI GPUs and i64.
> As a side note, I have been using a global load from a pointer argument
> as a way to enforce using VALU instructions in tests, but that this
> works is a missing optimization. The pointer needs to be dynamically
> indexed into by a VGPR, because loading a constant offset from a kernel
> argument pointer could be optimized into an s_load
speaking about optimization, how does SALU op + V_MOV compare to
equivalent VALU op?
> >> Should I keep the +var +imm +inv_var tests, since ffbh/l don't have the
> >> extra argument?
> I'm not exactly sure which tests you mean
I meant the patch 4/6 that copied test cases from ctpop, but your tests
look better, so this point is moot.
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: This is a digitally signed message part
More information about the llvm-commits