<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 03/25/2014 06:21 PM, Matt Arsenault
wrote:<br>
</div>
<blockquote cite="mid:53320153.9080106@amd.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 03/25/2014 02:31 PM, Jingyue Wu
wrote:<br>
</div>
<blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr"><br>
<div>However, we have three concerns on this:</div>
<div>a) I doubt this optimization is valid for all
targets, because LLVM language reference (<a
moz-do-not-send="true"
href="http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction"
target="_blank">http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction</a>)
says addrspacecast "can be a no-op cast or a complex
value modification, depending on the target and the
address space pair." <br>
</div>
</div>
</div>
</div>
</blockquote>
I think most of the simple cast optimizations would be acceptable.
The addrspacecasted pointer still needs to point to the same
memory location, so changing an access to use a different address
space would be OK. I think canonicalizing accesses to use the
original address space of a casted pointer when possible would
make sense.<br>
<br>
<blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div>b) NVPTX and R600 have different address numbering
for the generic address space, which makes things more
complicated. </div>
<div>c) We don't have a good understanding of the R600
backend. </div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
R600 currently does not support the flat address space
instructions intended to use for the generic address space. I
posted a patch a while ago that half added it, which I can try to
work on finishing if it would help.<br>
<br>
I also do not understand how NVPTX uses address spaces,
particularly how it can use 0 as the the generic address space.<br>
</blockquote>
<br>
We handle alloca by expanding it to a local stack reservation plus a
pointer conversion to the generic address space. So if we have IR
like the following:<br>
<tt><br>
</tt><tt>%ptr = alloca i32</tt><tt><br>
</tt><tt>store i32 0, i32* %ptr</tt><br>
<br>
This will really get expanded to something like the following at
MachineInstr-level (in pseudo-code):<br>
<br>
<tt>%local_ptr = %SP+offset ; Stack pointer (in thread-local
[private] address space)</tt><tt><br>
</tt><tt>%ptr = convert %local_ptr to generic address</tt><tt><br>
</tt><tt>store.generic.i32 [%ptr], 0</tt><br>
<br>
With the proposed optimization, this would be optimized back to a
non-generic store:<br>
<tt><br>
</tt><tt>%local_ptr = %SP+offset</tt><tt><br>
</tt><tt>%ptr = convert %local_ptr to generic address</tt><tt><br>
</tt><tt>%ptr.0 = convert %ptr to thread-local address space</tt><tt><br>
</tt><tt>store.local.i32 [%ptr.0], 0</tt><br>
<br>
This turns the address space conversion sequence into a no-op
(assuming no other users) that can be eliminated, and a non-generic
store is likely to be more efficient than a generic store.<br>
<br>
<blockquote cite="mid:53320153.9080106@amd.com" type="cite"> <br>
<blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div>2. How effective do we want this optimization to be? </div>
<div><br>
</div>
<div>In the short term, I want it to be able to eliminate
unnecessary non-generic-to-generic addrspacecasts the
front-end generates for the NVPTX target. For example, <br>
</div>
<div><br>
</div>
<div>%p1 = addrspace i32 addrspace(3)* %p0 to i32*</div>
<div>%v = load i32* %p1</div>
<div><br>
</div>
<div>=></div>
<div><br>
</div>
<div>%v = load i32 addrspace(3)* %p0</div>
<div><br>
</div>
<div>We want similar optimization for store+addrspacecast
and gep+addrspacecast as well. </div>
<div><br>
</div>
<div>In a long term, we could for sure improve this
optimization to handle more instructions and more
patterns. </div>
<span></span><br>
</div>
</div>
</div>
</blockquote>
I believe most of the cast simplifications that apply to bitcasts
of pointers also apply to addrspacecast. I have some patches
waiting that extend some of the more basic ones to understand
addrspacecast (e.g. <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html</a>),
plus a few more that I haven't posted yet. Mostly they are little
cast simplifications like your example in instcombine, but also
SROA to eliminate allocas that are addrspacecasted.<br>
<br>
-Matt<br>
</blockquote>
<br>
<DIV>
<HR>
</DIV>
<DIV>This email message is for the sole use of the intended recipient(s) and may
contain confidential information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the original
message. </DIV>
<DIV>
<HR>
</DIV>
</body>
</html>