<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks for the great response John! As I worked on it more, I realized, multiple stacks is in fact an overkill and just differentiating pointers is good enough.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The solution I ended up implementing, to dynamically change the load-latency based on address spaces, is to use TargetSubtargetInfo's adjustSchedDependency() method:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><a href="https://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a80e16c673bf028bf985c58b50a8a70c5" style="font-family:Arial,Helvetica,sans-serif">https://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a80e16c673bf028bf985c58b50a8a70c5</a></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">In adjustSchedDependency, I updated the dependency edge's (<a href="https://llvm.org/doxygen/classllvm_1_1SDep.html">SDep</a>) latency as:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-size:small"><font face="monospace">void RISCVSubtarget::adjustSchedDependency (SUnit *Def, SUnit *Use,<br> SDep &Dep) const {<br> MachineInstr *SrcInst = Def->getInstr();<br> if (!Def->isInstr())<br> return;<br><br> if (getCPU() == "mycpu") {<br> ArrayRef<MachineMemOperand*> memops = SrcInst->memoperands();<br> if (SrcInst->mayLoad() &&<br> !memops.empty() && memops[0]->getAddrSpace() == 1) {<br> Dep.setLatency(20);<br> }<br> }<br>}</font><br></div><div class="gmail_default" style="font-size:small"><font face="monospace"><br></font></div><div class="gmail_default" style="font-size:small"><font face="arial, sans-serif">I'm wondering, is this the intended use of </font>adjustSchedDependency? Is there a more recommended way of modeling load-latency based on address space?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 27, 2020 at 3:30 PM John McCall <<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 27 May 2020, at 17:13, Henrik Olsson via llvm-dev wrote:<br>
> I'm unclear on whether it works with automatic variables. Testing it in<br>
> clang gives an error message "automatic variable qualified with address<br>
> space".<br>
> However here is an LLVM discussion on the semantics of alloca with<br>
> different address spaces:<br>
> <a href="https://lists.llvm.org/pipermail/llvm-dev/2015-August/089706.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2015-August/089706.html</a><br>
> Some people seem to have gotten multiple separate stacks working, according<br>
> to my skim read.<br>
> So it might potentially be technically supported in LLVM, but not in clang.<br>
<br>
The alloca address space changes the address space of the stack, but<br>
there’s still only one stack. So Clang supports generating code with a<br>
non-zero alloca address space, but it doesn’t support changing the address<br>
space of individual local variables.<br>
<br>
Bandhav, you may find it interesting to look into some of the work done<br>
for GPU targets. I think AMDGPU has the ability to compile arbitrary<br>
C/C++ code for their GPU. Ordinary C/C++ code is unaware of address<br>
spaces, but the hardware has the traditional GPU memory model of different<br>
private/global/constant address spaces, plus a generic address space<br>
which encompasses all of them (but is much more expensive to access).<br>
By default, you treat an arbitrary C pointer as a pointer in the generic<br>
address space, but LLVM and Clang know that local and global variables are<br>
in various other address spaces. That creates a mismatch, which Clang<br>
handles by implicitly promoting pointers to the generic address space<br>
when you take the address of a local/global. In the optimizer, you can<br>
recognize accesses to promoted pointers and rewrite them to be accesses<br>
in the original address space. This is then relatively easy to combine<br>
with address-space attributes so that you can explicitly record that a<br>
particular pointer is known to be in a particular address space.<br>
<br>
Of course, for the promotion part of that to be useful to you, you need<br>
specific kinds of memory (e.g. the stack) to fall into meaningful address<br>
ranges for your cost model, or else you’ll be stuck treating almost every<br>
access conservatively. You could still use address spaces to know that<br>
specific accesses are faster, but not being able to make default<br>
assumptions about anything will be really limiting.<br>
<br>
That’s also all designed for an implementation where pointers in<br>
different address spaces are actually representationally different. It<br>
might be overkill just for better cost-modeling of a single address space<br>
with non-uniform access costs. In principle, you could get a lot of work<br>
done just by doing a quick analysis to see if an access is known to be<br>
to the stack.<br>
<br>
John.<br>
</blockquote></div>