[llvm] r298368 - [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler

Tue Mar 21 07:49:57 PDT 2017

On 03/21/2017 09:19 AM, Pykhtin, Valery wrote:
> Hi Hal,
>
> Thank you for pointing this out. I thought a reference at the review is enough. I copy/paste the overview from the review here, as it is up  to date.

Thanks! This makes it much easier to find things when searching e-mail, 
git logs, etc.

Could you elaborate on what "iterative" means here? You talk about using 
lightweight schedules so that you can rank and compare multiple 
schedules. Is this being done in the "current iteration schedule vs. 
next iteration schedule" or in some more general sense?

Also, do the current strategies iterate at all? I'm trying to get a 
better feel for how the iterative process will actually work (i.e. why 
does iteration change the answer).

  -Hal

>
> Iterative approach to find best schedule is essential for GCN architecture. This change combines a number of ideas for iterative scheduling and present the infrastructure.
>
> Lightweight scheduling
>
> Default schedulers are scheduling immediately on MIR reordering instructions and updating IR data, such as LiveIntervals. This is relatively heavy - instead a scheduling strategy can return an array of MachineInstr pointers (or equivalent, as does SIScheduler) that defines particular schedule. This lightweight schedule can be scored against other variants and implemented once. There're two types of lightweight schedules:
>
> 1. array of pointers to DAG SUnits - supposed to be returned by strategies. The benefit here is that scoring function can use DAG SUnits. Doesn't include debug values.
> 2. array of pointers to MachineInstr - this is so called 'detached' schedule in the sence that it doesn't depend on DAG state anymore and includes debug values. This is usefull when there is a need to store some variants for a later selection.
>
> Scheduling using different strategies require a strategy to preserve DAG state so that other strategies can reuse the same DAG. This can be achieved either by saving touched DAG data, or better not touching DAG at all by annotating DAG SUnits with relevant for a particual strategy information: SUnit has NodeNum field which allows easy annotation not using maps. Minreg strategy implements later approach.
>
> GCNUpwardRPTracker
>
> Lightweight schedules cannot be tracked using llvm RP trackers, for this purpose GCNUpwardRPTracker was introduced. As the name states it can only go upward inst by inst. The order of inst is defined by the tracker's caller, so it can be used both for tracking lightweight schedules and IR sequences. Upward tracking is easier to implement because it only requires region liveout set to operate, except for one case, when we need to find used livemask for a tuple register use. Despite that for lightweight schedule LiveIntervals isn't updated yet for a given instruction it can be still used because livemask for a use would not change for any schedule, as all defs should dominate the use. The subregister definitions can be reordered, but the overall mask should remain the same.
>
> TODO: save liveout sets for every region when recording and reuse for subsequent RP tracking as liveouts doesn't depend on schedule.
>
> GCNRegPressure
>
> the structure to track register pressure. Contains number of SGPR/VGPRs used, weights for large SGPR/VGPRs and compare function - pressure giving max occupancy wins, otherwise wins pressure with the lowest large registers weight.
>
> Minimal register scheduler (example)
>
> This is an experimental simple scheduler the main purpose of which is to learn ways how to consume less possible registers for a region (it doesn't care for performance at all). It doesn't always return minimal usage but works relatively well on large regions with unrolled loops. It also used in tryMaximizeOccupancy scheduling pass.
>
> Legacy Max occupancy scheduler
>
> included as the example and mimics current behavior. It doesn't use lightweight schedules but shows an example of how legacy and lightweight schedulers can be intermixed. The main difference is that it first collects all the regions to schedule and sorts them by regpressure. This way it starts with the fattest region first knowing best achievable occupancy beforehand. It also includes tryMaximizeOccupancy pass which tries to minimize register usage with minreg strategy for the most consuming regions.
>
> None of these schedulers are turned on by default in this change.
>
> Testing:
> Legacy Max occupancy scheduler fully passes lit tests.
> Minreg runs lit tests without asserts.
>
> No performance impact so far.
>
>
>
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Tuesday, March 21, 2017 4:43 PM
> To: Pykhtin, Valery; llvm-commits at lists.llvm.org
> Subject: Re: [llvm] r298368 - [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
>
>
> On 03/21/2017 08:15 AM, Valery Pykhtin via llvm-commits wrote:
>> Author: vpykhtin
>> Date: Tue Mar 21 08:15:46 2017
>> New Revision: 298368
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=298368&view=rev
>> Log:
>> [AMDGPU] Iterative scheduling infrastructure + minimal registry
>> scheduler
> Hi Valery,
>
> In the future, please make your commit messages more explanatory.
> "Iterative scheduling infrastructure + minimal registry scheduler" tells me next to nothing about what this is or how it works. It is obviously an extensive addition/change. The review had a good summary, and that should have appeared here (updated to reflect any changes made as a result of the code review).
>
> I'd appreciate it if you would reply to this thread with the updated summary.
>
> Thanks,
> Hal
>
>> Differential revision: https://reviews.llvm.org/D31046
>>
>> Added:
>>       llvm/trunk/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
>>       llvm/trunk/lib/Target/AMDGPU/GCNIterativeScheduler.h
>>       llvm/trunk/lib/Target/AMDGPU/GCNMinRegStrategy.cpp
>>       llvm/trunk/lib/Target/AMDGPU/GCNRegPressure.cpp
>>       llvm/trunk/lib/Target/AMDGPU/GCNRegPressure.h
>>       llvm/trunk/test/CodeGen/AMDGPU/schedule-regpressure-limit2.ll
>> Modified:
>>       llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.h
>>       llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
>>       llvm/trunk/lib/Target/AMDGPU/CMakeLists.txt
>>       llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.cpp
>>       llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.h
>>       llvm/trunk/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll
>>
>> Modified: llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.h
>> ...
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory