[Libclc-dev] [PATCH 1/4] Fix and improvements to barrier() for R600 targets
damien.hilloulin at epfl.ch
Sat Aug 23 10:46:41 PDT 2014
Sorry for replying late, it seems like the mail server of my (french)
school is down since yesterday. So I will use this address (swiss
university) during the shutdown.
(I have read the replies to my mails from the libclc archives.)
I will try to address all the things you pointed out:
- I agree with you, my mails are not really pretty formated (I used git
format-patch and then manually sent the mails instead of using send-mail
directly. Stupid error) :( .
- @Tom: I don't understand what you mean when you wrote :
"I would really like to see a generic implementation of this which use
in reply to the patch 2.
Do you mean that you would like that the async copy should use a
barrier? I think so too. I think that a barrier at the beginning of the
async copy would be enough to start the copy in good conditions.
- I have read the spec and my conclusion is that a barrier is a
work-group syncpoint, whatever are the flags. So I think that we must
have a barrier nofence() call.
- For the localglobal() stuff used everywhere, it is used to mimic how
the closed driver seems to do. In their IR output we can see that they
have chosen to use different pseudo-instructions for all the
possibilities: barriers and memory fences seem to have different
intrinsics according to the different flags and all. So I thought that
maybe, it would be intereseting to do the same.
Thanks to that, it is really easy to lower correctly intrinsics, and we
have no change to do if someday some hardware has a special instruction
for every combination (very irealistic however).
But I can change that if you want.
- I have considered making a very simple implementation of barriers with
a call to mem_fence and the actual barrier intrinsic. But the close
driver have special intrinsics so... ^^
I will look at the replies to my LLVM patches now,
More information about the Libclc-dev