[cfe-dev] [RFC] Late (OpenMP) GPU code "SPMD-zation"
Alexey Bataev via cfe-dev
cfe-dev at lists.llvm.org
Tue Jan 22 11:46:39 PST 2019
No, we don't. We need to perform the different kind of the analysis for
SPMD mode constructs and Non-SPMD.
For SPMD mode we need to globalize only reduction/lastprivate variables.
For Non-SPMD mode, we need to globalize all the private/local variables,
that may escape their declaration context in the construct.
-------------
Best regards,
Alexey Bataev
22.01.2019 14:29, Doerfert, Johannes Rudolf пишет:
> We would still know that. We can do exactly the same reasoning as we
> do now.
>
> I think the important question is, how different is the code generated
> for either mode and can we hide (most of) the differences in the runtime.
>
>
> If I understand you correctly, you say the data sharing code looks
> very different and the differences cannot be hidden, correct?
>
> It would be helpful for me to understand your point if you could give
> me a piece of OpenMP for which the data sharing in SPMD mode and "guarded"
>
> mode are as different as possible. I can compile it in both modes
> myself so high-level OpenMP is fine (I will disable SPMD mode manually
> in the source if necessary).
>
>
> Thanks,
>
> Johannes
>
>
>
>
> ------------------------------------------------------------------------
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Alexey
> Bataev via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* Tuesday, January 22, 2019 13:10
> *To:* Doerfert, Johannes Rudolf
> *Cc:* Alexey Bataev; LLVM-Dev; Arpith Chacko Jacob;
> openmp-dev at lists.llvm.org; cfe-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] [RFC] Late (OpenMP) GPU code "SPMD-zation"
>
> But we need to know the execution mode, SPMD or "guarded"
>
> -------------
> Best regards,
> Alexey Bataev
> 22.01.2019 13:54, Doerfert, Johannes Rudolf пишет:
>> We could still do that in clang, couldn't we?
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>> ------------------------------------------------------------------------
>> *From:* Alexey Bataev <a.bataev at outlook.com>
>> <mailto:a.bataev at outlook.com>
>> *Sent:* Tuesday, January 22, 2019 12:52:42 PM
>> *To:* Doerfert, Johannes Rudolf; cfe-dev at lists.llvm.org
>> <mailto:cfe-dev at lists.llvm.org>
>> *Cc:* openmp-dev at lists.llvm.org <mailto:openmp-dev at lists.llvm.org>;
>> LLVM-Dev; Finkel, Hal J.; Alexey Bataev; Arpith Chacko Jacob
>> *Subject:* Re: [RFC] Late (OpenMP) GPU code "SPMD-zation"
>>
>>
>> The globalization for the local variables, for example. It must be
>> implemented in the compiler to get the good performance, not in the
>> runtime.
>>
>>
>> -------------
>> Best regards,
>> Alexey Bataev
>> 22.01.2019 13:43, Doerfert, Johannes Rudolf пишет:
>>> Could you elaborate on what you refer to wrt data sharing. What do
>>> we currently do in the clang code generation that we could not
>>> effectively implement in the runtime, potentially with support of an
>>> llvm pass.
>>>
>>> Thanks,
>>> James
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Alexey Bataev <a.bataev at outlook.com>
>>> <mailto:a.bataev at outlook.com>
>>> *Sent:* Tuesday, January 22, 2019 12:34:01 PM
>>> *To:* Doerfert, Johannes Rudolf; cfe-dev at lists.llvm.org
>>> <mailto:cfe-dev at lists.llvm.org>
>>> *Cc:* openmp-dev at lists.llvm.org <mailto:openmp-dev at lists.llvm.org>;
>>> LLVM-Dev; Finkel, Hal J.; Alexey Bataev; Arpith Chacko Jacob
>>> *Subject:* Re: [RFC] Late (OpenMP) GPU code "SPMD-zation"
>>>
>>>
>>>
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>> 22.01.2019 13:17, Doerfert, Johannes Rudolf пишет:
>>>> Where we are
>>>> ------------
>>>>
>>>> Currently, when we generate OpenMP target offloading code for GPUs, we
>>>> use sufficient syntactic criteria to decide between two execution modes:
>>>> 1) SPMD -- All target threads (in an OpenMP team) run all the code.
>>>> 2) "Guarded" -- The master thread (of an OpenMP team) runs the user
>>>> code. If an OpenMP distribute region is encountered, thus
>>>> if all threads (in the OpenMP team) are supposed to
>>>> execute the region, the master wakes up the idling
>>>> worker threads and points them to the correct piece of
>>>> code for distributed execution.
>>>>
>>>> For a variety of reasons we (generally) prefer the first execution mode.
>>>> However, depending on the code, that might not be valid, or we might
>>>> just not know if it is in the Clang code generation phase.
>>>>
>>>> The implementation of the "guarded" execution mode follows roughly the
>>>> state machine description in [1], though the implementation is different
>>>> (more general) nowadays.
>>>>
>>>>
>>>> What we want
>>>> ------------
>>>>
>>>> Increase the amount of code executed in SPMD mode and the use of
>>>> lightweight "guarding" schemes where appropriate.
>>>>
>>>>
>>>> How we get (could) there
>>>> ------------------------
>>>>
>>>> We propose the following two modifications in order:
>>>>
>>>> 1) Move the state machine logic into the OpenMP runtime library. That
>>>> means in SPMD mode all device threads will start the execution of
>>>> the user code, thus emerge from the runtime, while in guarded mode
>>>> only the master will escape the runtime and the other threads will
>>>> idle in their state machine code that is now just "hidden".
>>>>
>>>> Why:
>>>> - The state machine code cannot be (reasonably) optimized anyway,
>>>> moving it into the library shouldn't hurt runtime but might even
>>>> improve compile time a little bit.
>>>> - The change should also simplify the Clang code generation as we
>>>> would generate structurally the same code for both execution modes
>>>> but only the runtime library calls, or their arguments, would
>>>> differ between them.
>>>> - The reason we should not "just start in SPMD mode" and "repair"
>>>> it later is simple, this way we always have semantically correct
>>>> and executable code.
>>>> - Finally, and most importantly, there is now only little
>>>> difference (see above) between the two modes in the code
>>>> generated by clang. If we later analyze the code trying to decide
>>>> if we can use SPMD mode instead of guarded mode the analysis and
>>>> transformation becomes much simpler.
>>>
>>> The last item is wrong, unfortunately. A lot of things in the
>>> codegen depend on the execution mode, e.g. correct support of the
>>> data-sharing. Of course, we can try to generalize the codegen and
>>> rely completely on the runtime, but the performance is going to be
>>> very poor.
>>>
>>> We still need static analysis in the compiler. I agree, that it is
>>> better to move this analysis to the backend, at least after the
>>> inlining, but at the moment it is not possible. We need the support
>>> for the late outlining, which will allow to implement better
>>> detection of the SPMD constructs + improve performance.
>>>
>>>> 2) Implement a middle-end LLVM-IR pass that detects the guarded mode,
>>>> e.g., through the runtime library calls used, and that tries to
>>>> convert it into the SPMD mode potentially by introducing lightweight
>>>> guards in the process.
>>>>
>>>> Why:
>>>> - After the inliner, and the canonicalizations, we have a clearer
>>>> picture of the code that is actually executed in the target
>>>> region and all the side effects it contains. Thus, we can make an
>>>> educated decision on the required amount of guards that prevent
>>>> unwanted side effects from happening after a move to SPMD mode.
>>>> - At this point we can more easily introduce different schemes to
>>>> avoid side effects by threads that were not supposed to run. We
>>>> can decide if a state machine is needed, conditionals should be
>>>> employed, masked instructions are appropriate, or "dummy" local
>>>> storage can be used to hide the side effect from the outside
>>>> world.
>>>>
>>>>
>>>> None of this was implemented yet but we plan to start in the immediate
>>>> future. Any comments, ideas, criticism is welcome!
>>>>
>>>>
>>>> Cheers,
>>>> Johannes
>>>>
>>>>
>>>> P.S. [2-4] Provide further information on implementation and features.
>>>>
>>>> [1] https://ieeexplore.ieee.org/document/7069297
>>>> [2] https://dl.acm.org/citation.cfm?id=2833161
>>>> [3] https://dl.acm.org/citation.cfm?id=3018870
>>>> [4] https://dl.acm.org/citation.cfm?id=3148189
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190122/7a0bfb89/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190122/7a0bfb89/attachment.sig>
More information about the cfe-dev
mailing list