<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 03/26/2014 01:33 PM, Jingyue Wu
wrote:<br>
</div>
<blockquote
cite="mid:CAMROOrFfEY9E9B=zMi176f8AkZ3y+A5Uwt2uODt_-D68_CwD9Q@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Mar 25, 2014 at 5:32 PM, Matt
Arsenault <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:Matthew.Arsenault@amd.com" target="_blank">Matthew.Arsenault@amd.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>
<div>On 03/25/2014 05:07 PM, Jingyue Wu wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Mar 25, 2014 at
3:21 PM, Matt Arsenault <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:Matthew.Arsenault@amd.com"
target="_blank">Matthew.Arsenault@amd.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>
<div>On 03/25/2014 02:31 PM, Jingyue Wu
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr"><br>
<div>However, we have three
concerns on this:</div>
<div>a) I doubt this
optimization is valid for all
targets, because LLVM language
reference (<a
moz-do-not-send="true"
href="http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction"
target="_blank">http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction</a>)
says addrspacecast "can be a
no-op cast or a complex value
modification, depending on the
target and the address space
pair." <br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
I think most of the simple cast
optimizations would be acceptable. The
addrspacecasted pointer still needs to
point to the same memory location, so
changing an access to use a different
address space would be OK. I think
canonicalizing accesses to use the
original address space of a casted pointer
when possible would make sense.</div>
</blockquote>
<div><br>
</div>
"the address space conversion is legal then
both result and operand refer to the same
memory location". I don't quite understand
this sentence. Does the same memory location
mean the same numeric value?</div>
</div>
</div>
</blockquote>
</div>
No, it means they could both have different values that
point to the same physical location. Storing to a
pointer in one address space should have the same effect
as storing to the addrspacecasted pointer, though it
might not use the same value or instructions to do so.
<div><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>That makes sense. Thanks! </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div> <br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div> <br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div>b) NVPTX and R600 have
different address numbering
for the generic address space,
which makes things more
complicated. </div>
<div>c) We don't have a good
understanding of the R600
backend. </div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</div>
R600 currently does not support the flat
address space instructions intended to use
for the generic address space. I posted a
patch a while ago that half added it,
which I can try to work on finishing if it
would help.<br>
<br>
I also do not understand how NVPTX uses
address spaces, particularly how it can
use 0 as the the generic address space.</div>
</blockquote>
<div><br>
</div>
<div>NVPTX backend generates ld.f32 for
reading from the generic address space.
There's no special machine instruction to
read/write from/to the generic address space
in R600? <br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
New hardware does have flat address space instructions,
which is what my patch adds support for. They're just
not defined in the target yet. This flat address space
is separate different from 0 / the default. I think of
addrspace(0) as the address space of allocas, so I don't
understand how that can be consolidated with generic
accesses of the other address spaces. Does NVPTX not
differentiate between accesses of a generic pointer and
private / alloca'd memory?
<div><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>See Justin's followup. Looks like this optimization can
benefit local accesses as well. <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
This optimization can benefit all address spaces. Imagine you have
a library function that takes pointer arguments in the generic
address space:<br>
<br>
<tt>__device__ float foo(float *a, float *b) {</tt><tt><br>
</tt><tt> return *a + *b;</tt><tt><br>
</tt><tt>}</tt><br>
<br>
Now let's say this function is called twice in a kernel function:<br>
<tt><br>
</tt><tt>__global__ void kern(float *a, float *b, float *dst) {</tt><tt><br>
</tt><tt> __shared__ float buffer[32];</tt><tt><br>
</tt><tt><br>
</tt><tt> ...</tt><tt><br>
</tt><tt><br>
</tt><tt> *dst = foo(a, b) + foo(&buffer[0], &buffer[1]);</tt><tt><br>
</tt><tt>}</tt><br>
<br>
Assuming 'foo' is inlined, you could convert the loads in the first
call to 'ld.global' and the loads from the second call to
'ld.shared'.<br>
<br>
<blockquote
cite="mid:CAMROOrFfEY9E9B=zMi176f8AkZ3y+A5Uwt2uODt_-D68_CwD9Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div> <br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div>2. How effective do we want
this optimization to be? </div>
<div><br>
</div>
<div>In the short term, I want
it to be able to eliminate
unnecessary
non-generic-to-generic
addrspacecasts the front-end
generates for the NVPTX
target. For example, <br>
</div>
<div><br>
</div>
<div>%p1 = addrspace i32
addrspace(3)* %p0 to i32*</div>
<div>%v = load i32* %p1</div>
<div><br>
</div>
<div>=></div>
<div><br>
</div>
<div>%v = load i32 addrspace(3)*
%p0</div>
<div><br>
</div>
<div>We want similar
optimization for
store+addrspacecast and
gep+addrspacecast as well. </div>
<div><br>
</div>
<div>In a long term, we could
for sure improve this
optimization to handle more
instructions and more
patterns. </div>
<span></span><br>
</div>
</div>
</div>
</blockquote>
</div>
I believe most of the cast simplifications
that apply to bitcasts of pointers also
apply to addrspacecast. I have some
patches waiting that extend some of the
more basic ones to understand
addrspacecast (e.g. <a
moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html"
target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html</a>),
plus a few more that I haven't posted yet.
Mostly they are little cast
simplifications like your example in
instcombine, but also SROA to eliminate
allocas that are addrspacecasted.</div>
</blockquote>
<div><br>
</div>
<div>We also think InstCombine is a good place
to put this optimization, if we decide to go
with target-independent. Looking forward to
your patches! </div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</div>
I think that strategy only gets you part of the way to
ideal. For example, preferring to use the original
address space works well for accesses to objects where
you start with the known address space. You could also
have a function with a generic address space argument
casted back to a specific address space. Preferring the
original address space in that case is the opposite of
what you want, although I expect this case will end up
being much less common in real code and will tend to go
away after inlining.<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>You're right. I overlooked this case. I doubt a CUDA
program would even uses generic-to-non-generic casts,
because non-generic address space qualifiers only qualify
declarations. </div>
<div><br>
</div>
<div>The backend can indicate which address spaces it
prefers using some flags (e.g., preferNonGenericPointers
as Justin suggested). InstCombine can then look at these
flags to decide what to do. <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The optimization should also treat the absence of target information
as a flag to just not run.<br>
<br>
<br>
<br>
<DIV>
<HR>
</DIV>
<DIV>This email message is for the sole use of the intended recipient(s) and may
contain confidential information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the original
message. </DIV>
<DIV>
<HR>
</DIV>
</body>
</html>