[PATCH] D79670: [AMDGPU] Order pos exports before param exports
Carl Ritson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 11 05:52:14 PDT 2020
critson marked 2 inline comments as done.
critson added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp:38
+
+ // Move position exports before other exports while preserving
+ // the order within different export types (pos or other).
----------------
foad wrote:
> Can you say why it's beneficial to do position exports first?
I will add comment.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp:41
+ unsigned InsertionPoint = 0;
+ for (unsigned Idx = 0, End = Chain.size(); Idx < End; ++Idx) {
+ SUnit *SU = Chain[Idx];
----------------
foad wrote:
> It's a shame that this sorting is O(n^2) but I guess it's not a problem because the average chain length will be 2?
>
> You could probably do this sorting in a cute way with std::partition_copy if you felt inclined: https://en.cppreference.com/w/cpp/algorithm/partition_copy
This sort is O(n), it passes through the list only once moving elements to the top as it goes.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp:82-85
+ // Pass through DAG gathering a list of exports and removing barrier edges
+ // creating dependencies on exports. Freeing exports of successor edges
+ // allows more scheduling freedom, and nothing should be order dependent
+ // on exports. Edges will be added later to order the exports.
----------------
foad wrote:
> Why are the barrier edges there in the first place? Either the exports can be reordered, so the barrier edges should not be there; or they can't be, so we shouldn't ignore the barrier edges!
At a high level, what is a barrier dependency on an export?
They get introduced because of intrinsics which access memory, etc.
Consider the following:
call void @llvm.amdgcn.exp.f32(i32 32, i32 15, float 1.0, float 1.0, float 1.0, float 1.0, i1 false, i1 false)
call void @llvm.amdgcn.exp.f32(i32 33, i32 15, float 1.0, float 1.0, float 1.0, float 0.5, i1 false, i1 false)
%load = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> undef, i32 %idx, i32 0, i32 0)
call void @llvm.amdgcn.exp.f32(i32 12, i32 15, float 0.0, float 0.0, float 0.0, float %load, i1 true, i1 false)
The load forces ordering between the exports either side of it. I do not think there is a hardware motivation for this?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79670/new/
https://reviews.llvm.org/D79670
More information about the llvm-commits
mailing list