[Libclc-dev] [PATCH 1/4] Fix and improvements to barrier() for R600 targets

Sat Aug 23 10:46:41 PDT 2014

Hello everyone,

Sorry for replying late, it seems like the mail server of my (french) 
school is down since yesterday. So I will use this address (swiss 
university) during the shutdown.
(I have read the replies to my mails from the libclc archives.)

I will try to address all the things you pointed out:

- I agree with you, my mails are not really pretty formated (I used git 
format-patch and then manually sent the mails instead of using send-mail 
directly. Stupid error) :( .

- @Tom: I don't understand what you mean when you wrote :
"I would really like to see a generic implementation of this which use 
barrier(). "

in reply to the patch 2.

Do you mean that you would like that the async copy should use a 
barrier? I think so too. I think that a barrier at the beginning of the 
async copy would be enough to start the copy in good conditions.

- I have read the spec and my conclusion is that a barrier is a 
work-group syncpoint, whatever are the flags. So I think that we must 
have a barrier nofence() call.

- For the localglobal() stuff used everywhere, it is used to mimic how 
the closed driver seems to do. In their IR output we can see that they 
have chosen to use different pseudo-instructions for all the 
possibilities: barriers and memory fences seem to have different 
intrinsics according to the different flags and all. So I thought that 
maybe, it would be intereseting to do the same.
Thanks to that, it is really easy to lower correctly intrinsics, and we 
have no change to do if someday some hardware has a special instruction 
for every combination (very irealistic however).
But I can change that if you want.

- I have considered making a very simple implementation of barriers with 
a call to mem_fence and the actual barrier intrinsic. But the close 
driver have special intrinsics so... ^^

I will look at the replies to my LLVM patches now,

Damien.