[LLVMdev] [cfe-dev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
hfinkel at anl.gov
Tue Oct 2 13:10:17 PDT 2012
On Tue, 2 Oct 2012 14:39:33 -0500
<dag at cray.com> wrote:
> Hal Finkel <hfinkel at anl.gov> writes:
> > As I've stated, whether the metadata is preserved is not really the
> > relevant metric. It is fine for a pass that does not understand
> > parallelization metadata to drop it. The important part is that
> > dropping the metadata, and moving instructions to which that
> > metadata is attached, must not cause miscompiles. For example:
> > - Instructions with unknown side effects or dependencies must not
> > be moved from outside a parallel region to inside a parallel region.
> > - Serialized subregions inside of parallel regions cannot be
> > deleted without deleting the enclosing parallel region.
> > The outstanding proposals have ways of dealing with these things. In
> > the case of my proposal, it is though cross-referencing the metadata
> > sufficiently and using function boundaries to prevent unwanted code
> > motion. In Intel's case, it is by using the barriers implied by the
> > intrinsics calls.
> These two paragraphs seem contradictory to me. How can a pass rely on
> the metadata to not do illegal code motion if the pass has dropped the
> metadata? I must be missing something important.
> The only way I can think that this would work is that the explicit
> outlining is already done so there is no way to move between
> parallel/non-parallel without going all interprocedurally bonkers. :)
Yes, this is exactly what I mean. The metadata needs to be
appropriately cross-referenced, so that if any parallelization
metadata within some parallel region is dropped, then this can be
detected, and the entire parallel region can be dropped. The code
motion would be prevented by explicit outlining. The inliner would need
to be taught not to inline functions with parallelization
metadata (when non-trivial parallelization is enabled). That, however,
seems like a small and simple change.
In Intel's proposal, code motion is prevented because the
parallelization intrinsics serve as explicit scheduling barriers. We'd
need, I suppose, to enhance various passes to understand when it could
override the barrier and move code regardless (for optimization).
> This is the kind of thing that worries me about these proposals.
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev