[PATCH] D35319: LSE Atomics reorg - Part I

Thu Jul 13 12:31:10 PDT 2017

steleman added a comment.

In https://reviews.llvm.org/D35319#807963, @t.p.northover wrote:

> This diff covers lots of different areas:

> 2. The instruction definitions: horrible on the surface, but a massive bug might justify them. It's completely unclear why they're necessary (especially as this patch contains no tests).

Now on to why the changes to the instruction definitions:

I implemented the memory ordering semantics for all the LSE Atomics with Intrinsics. As in:

{..}include/llvm/Intrinsics/IntrinsicsAArch64.td:

// Atomic LD<OP> Intrinsics.
def int_aarch64_ldadd_32 :

  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_anyptr_ty]>;

def int_aarch64_ldadd_64 :

  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_anyptr_ty]>;

def int_aarch64_ldadda_32 :

  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_anyptr_ty]>;

def int_aarch64_ldadda_64 :

  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_anyptr_ty]>;

def int_aarch64_ldaddl_32 :

  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_anyptr_ty]>;

def int_aarch64_ldaddl_64 :

  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_anyptr_ty]>;

def int_aarch64_ldaddal_32 :

  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_anyptr_ty]>;

def int_aarch64_ldaddal_64 :

  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_anyptr_ty]>;

[ etc etc etc ]

The instruction definition themselves (in AArch64InstrInfo.td) are changed to:

let AddedComplexity = 5, Predicates = [HasLSE] in {
 def LDADDB    : BaseLDOP<0b00, 0, 0, 0b000, "add", "", "b",

  int_aarch64_ldadd_32, GPR32>;

def LDADDH    : BaseLDOP<0b01, 0, 0, 0b000, "add", "", "h",

  int_aarch64_ldadd_32, GPR32>;

def LDADDW    : BaseLDOP<0b10, 0, 0, 0b000, "add", "", "",

  int_aarch64_ldadd_32, GPR32>;

def LDADDX    : BaseLDOP<0b11, 0, 0, 0b000, "add", "", "",

  int_aarch64_ldadd_64, GPR64>;

[ etc etc etc ]

And in AArch64InstrFormats.td:

class BaseLDOP<bits<2> sz, bits<1> acq, bits<1> rel, bits<3> opc,

               string op, string order, string size,
               Intrinsic OpNode, RegisterClass RC>
  : BaseLDOPEncoding<(outs RC:$Rt),
                     (ins RC:$Rs, GPR64sp:$Rn),
                     "ld" # op # order # size,
                     "\t$Rs, $Rt, [$Rn]", "",
                     []>,
                     Sched<[WriteAtomic, WriteLD, WriteST]> {
  let Sz = sz;
  let Acq = acq;
  let Rel = rel;
  let Opc = opc;

}

In AArch64ISelLowering.cpp, each instruction lowering function will discover the correct
memory ordering model from the Intrinsic Opcode and the AtomicOrdering provided by the AtomicSDNode.

Some ISD NodeTypes can be lowered by the same function -- in this case the lowering function acts as a pure pass-through to a specific LSE Opcode. Some others need special treatment (for example ISD::ATOMIC_LOAD_SUB becomes ISD::ATOMIC_LOAD_ADD).

In AArch64ISelDAGToDAG.cpp, the correct instruction selection - with the correct memory ordering and register size - will be done by each instruction selection function, based on the corresponding Intrinsics Opcode. Just like in the instruction lowering case, several different instructions can be handled by the same instruction selection function.

This is the reason why the instruction definitions were expanded to explicitly describe each and every single instruction - as opposed to using the original multiclass design: a different Intrinsic need to be passed to the Instruction Definition depending on register size, and memory ordering. I do not think it is possible to accomplish this particular design with a multiclass.

Repository:
  rL LLVM

https://reviews.llvm.org/D35319