<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks for the great response John! As I worked on it more, I realized, multiple stacks is in fact an overkill and just differentiating pointers is good enough.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The solution I ended up implementing, to dynamically change the load-latency based on address spaces, is to use TargetSubtargetInfo's adjustSchedDependency() method:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><a href="https://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a80e16c673bf028bf985c58b50a8a70c5" style="font-family:Arial,Helvetica,sans-serif">https://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a80e16c673bf028bf985c58b50a8a70c5</a></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">In adjustSchedDependency, I updated the dependency edge's (<a href="https://llvm.org/doxygen/classllvm_1_1SDep.html">SDep</a>) latency as:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-size:small"><font face="monospace">void RISCVSubtarget::adjustSchedDependency (SUnit *Def, SUnit *Use,<br>                                            SDep &Dep) const {<br>  MachineInstr *SrcInst = Def->getInstr();<br>  if (!Def->isInstr())<br>    return;<br><br>  if (getCPU() == "mycpu") {<br>    ArrayRef<MachineMemOperand*> memops = SrcInst->memoperands();<br>    if (SrcInst->mayLoad() &&<br>        !memops.empty() && memops[0]->getAddrSpace() == 1) {<br>      Dep.setLatency(20);<br>    }<br>  }<br>}</font><br></div><div class="gmail_default" style="font-size:small"><font face="monospace"><br></font></div><div class="gmail_default" style="font-size:small"><font face="arial, sans-serif">I'm wondering, is this the intended use of </font>adjustSchedDependency? Is there a more recommended way of modeling load-latency based on address space?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 27, 2020 at 3:30 PM John McCall <<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 27 May 2020, at 17:13, Henrik Olsson via llvm-dev wrote:<br>

> I'm unclear on whether it works with automatic variables. Testing it in<br>

> clang gives an error message "automatic variable qualified with address<br>

> space".<br>

> However here is an LLVM discussion on the semantics of alloca with<br>

> different address spaces:<br>

> <a href="https://lists.llvm.org/pipermail/llvm-dev/2015-August/089706.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2015-August/089706.html</a><br>

> Some people seem to have gotten multiple separate stacks working, according<br>

> to my skim read.<br>

> So it might potentially be technically supported in LLVM, but not in clang.<br>

<br>

The alloca address space changes the address space of the stack, but<br>

there’s still only one stack.  So Clang supports generating code with a<br>

non-zero alloca address space, but it doesn’t support changing the address<br>

space of individual local variables.<br>

<br>

Bandhav, you may find it interesting to look into some of the work done<br>

for GPU targets.  I think AMDGPU has the ability to compile arbitrary<br>

C/C++ code for their GPU.  Ordinary C/C++ code is unaware of address<br>

spaces, but the hardware has the traditional GPU memory model of different<br>

private/global/constant address spaces, plus a generic address space<br>

which encompasses all of them (but is much more expensive to access).<br>

By default, you treat an arbitrary C pointer as a pointer in the generic<br>

address space, but LLVM and Clang know that local and global variables are<br>

in various other address spaces.  That creates a mismatch, which Clang<br>

handles by implicitly promoting pointers to the generic address space<br>

when you take the address of a local/global.  In the optimizer, you can<br>

recognize accesses to promoted pointers and rewrite them to be accesses<br>

in the original address space.  This is then relatively easy to combine<br>

with address-space attributes so that you can explicitly record that a<br>

particular pointer is known to be in a particular address space.<br>

<br>

Of course, for the promotion part of that to be useful to you, you need<br>

specific kinds of memory (e.g. the stack) to fall into meaningful address<br>

ranges for your cost model, or else you’ll be stuck treating almost every<br>

access conservatively.  You could still use address spaces to know that<br>

specific accesses are faster, but not being able to make default<br>

assumptions about anything will be really limiting.<br>

<br>

That’s also all designed for an implementation where pointers in<br>

different address spaces are actually representationally different.  It<br>

might be overkill just for better cost-modeling of a single address space<br>

with non-uniform access costs.  In principle, you could get a lot of work<br>

done just by doing a quick analysis to see if an access is known to be<br>

to the stack.<br>

<br>

John.<br>

</blockquote></div>