<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Le 23/08/2014 20:30, Matt Arsenault a
écrit :<br>
</div>
<blockquote
cite="mid:CAC=NOtrzrZpviYH1hjXzXZy4EV5HfBzLb7gExoGsrK7xcd-a0w@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
- I have read the spec and my conclusion is that a barrier is a
work-group syncpoint, whatever are the flags. So I think that we
must have a barrier nofence() call.<br>
<br>
</blockquote>
<div>I would agree, though the spec is ambiguous. I would make it
fence all address spaces as the fallback else case for a non
compile time constant (though I remember finding that was not
allowed, though I've never re-found where in the spec that is
specified. It should be a frontend warning anyway)<span></span></div>
<div> </div>
</blockquote>
I have seen that when flags is 0, the closed driver queues a memory
fence for local and global. So I think we should do like you say. I
will change that.<br>
<br>
<blockquote
cite="mid:CAC=NOtrzrZpviYH1hjXzXZy4EV5HfBzLb7gExoGsrK7xcd-a0w@mail.gmail.com"
type="cite">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
- For the localglobal() stuff used everywhere, it is used to
mimic how the closed driver seems to do. In their IR output we
can see that they have chosen to use different
pseudo-instructions for all the possibilities: barriers and
memory fences seem to have different intrinsics according to the
different flags and all.</blockquote>
<div><br>
</div>
<div>This is because in AMDIL the same fence instruction with
different modifiers implements all of the variations of barrier
and mem_fence. LLVM is not aware of the hardware details of how
it works and does not do any real scheduling</div>
</blockquote>
Ok, it is certainly a better way of doing this.<br>
<blockquote
cite="mid:CAC=NOtrzrZpviYH1hjXzXZy4EV5HfBzLb7gExoGsrK7xcd-a0w@mail.gmail.com"
type="cite">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> So I thought
that maybe, it would be intereseting to do the same.<br>
Thanks to that, it is really easy to lower correctly intrinsics,
and we have no change to do if someday some hardware has a
special instruction for every combination (very irealistic
however).<br>
But I can change that if you want.<br>
<br>
- I have considered making a very simple implementation of
barriers with a call to mem_fence and the actual barrier
intrinsic. But the close driver have special intrinsics so... ^^</blockquote>
<div><br>
</div>
<div>As mentioned in the LLVM thread, barrier can't be used to
implement a mem_fence </div>
</blockquote>
Yeah I know. It was a wrong first implementation for sure. But
barriers queue memory fences right? So it could be possible to
implement like a memory fence then a sync point, isn't it?<br>
</body>
</html>