[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Fri Jun 8 07:11:18 PDT 2018

Hi,

I am extending LLVM for HPC applications.
As one of them, I am trying to make MachinePipeliner available on
AArch64 + Scalable Vector Extension environment.

MachinePipeliner is currently used only by Hexagon CPU.
Since it is a very portable implementation, I think that it will
actually work just by adding a little code for many CPUs(See Code [2]).

The current MachinePipeliner is written on the premise that
DFAPacketizer is used for resource management.
However, I'd like to use MachinePipeliner in a way that does not use
DFAPacketizer for the reasons described below(*).
In MachinePipeliner implementation, only a small part is dependent on
DFAPacketizer or Instruction itineraries.
Therefore, I think that one of the following implementations is
possible:

(a) creating a path in MachinePipeliner that does not use DFAPacketizer
(b) making MachinePipeliner inheritable so that anyone can write code
    that does not use DFAPacketizer

Since implementations using only Instruction itineraries without
DFAPacketizer are possible, I don't think that I can use
TargetSchedModel::hasInstrItineraries to select the execution path.
Personally, I think that implementation of (b) is better.

Also, if predicated instructions like SVE are available, prologue and
epilogue code generation using predicated execution as shown in the
reference[1] may be possible.
In this case, if we choose the implementation of (b) and it is
possible to override SwingSchedulerDAG::generatePipelinedLoop, I think
that it can easily be extended.

Comments or suggestions are welcome.

Thank you very much.

Best regards,
--
--------------------------------------
Masaki Arai

========================================

(*) Currently, many CPU scheduling models are defined by the form not
using Instruction itineraries.
Therefore, they have the form 1 or 2 in the following
TargetSchedule.td:

// The SchedMachineModel is defined by subtargets for three categories of
data:
// 1. Basic properties for coarse grained instruction cost model.
// 2. Scheduler Read/Write resources for simple per-opcode cost model.
// 3. Instruction itineraries for detailed reservation tables.

By making MachinePipeliner work even in a form not using Instruction
itineraries, we will be able to run MachinePipeliner's execution test
on various machines, even if we do not use it on those machines.

Instruction itineraries essentially expresses the following
correspondence:

  opcode ==> {FU1, FU2, ...}

and DFAPacketizer uses DFA with opcodes.
In order to strictly schedule predicated instructions like SVE,
We need to consider that following two instructions use pipeline resources
exclusively in the same cycle:

  MI1 if P ==> {FU1, FU2, ...}
  MI2 if Q ==> {FU1, FU2, ...}

where predicate P and Q hold P == not Q.
However, I don't think that current DFAPacketizer can represent these
situations.

References:

[1] Code Generation Schemas for Modulo Scheduled DO-loops and WHILE-loops
http://www.hpl.hp.com/techreports/92/HPL-92-47.pdf?jumpid=reg_R1002_USEN

Code:

  The sample patch for origin/release_60 [2], which doesn't use
  DFAPacketizer, can generate executable files from sample-code.c for
  both AArch64 and x86_64.

  [AArch64]% clang -O2 -mcpu=thunderx2t99 -mllvm -enable-pipeliner -mllvm
-pipeliner-max=100 sample-code.c
  [x86_64] % clang -O2 -march=sandybridge -mllvm -enable-pipeliner -mllvm
-pipeliner-max=100 sample-code.c

[2] https://reviews.llvm.org/D47943
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample-code.c
Type: application/octet-stream
Size: 468 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.obj>