[LLVMdev] proposal: add macro expansion of for-loop to TableGen

Wed Aug 24 01:30:09 PDT 2011

Hi folks,

TableGen provides sufficiently rich syntax for expressing target
instruction set. Nevertheless, when I wrote the PTX backend, I
observed that some redundancy in TableGen can be further eliminated
through macro expansion of for-loops.

The semantics of a for-loop is expanding the for-loop body, and so it
is equivalent to manually unroll the loop (see example #1).

I believe the for-loop is not only useful to the PTX backend but also
to other backends (see examples below). Generally speaking, a for-loop
can be used anywhere when you see a "table filling" pattern --- you
are writing repeated identical lines that only differs in a few places
(see examples below).

An (illustrative, not complete) BNF of for-loop is as follows:
----------------------------------------
FOR_LOOP := for INDICES in BLOCK

INDICES := INDEX, INDICES | INDEX

INDEX := VARIABLE = RANGE

RANGE := [comma separated values, ...] | function(args, ...)

BLOCK := { STATEMENTS } | STATEMENT;

STATEMENTS := STATEMENT; STATEMENTS | STATEMENT;
----------------------------------------

Notes:
* In statements, you may write #var or #func(args, ...) to expand a
macro or macro function.
* Macro functions are limited to a small set of functions, such as
sequence(), lower(), and upper().

====Example #1====

When defining register files, we repeatedly define every registers. In
ARMRegisterInfo.td:
----------------------------------------
def R0  : ARMReg< 0, "r0">,  DwarfRegNum<[0]>;
def R1  : ARMReg< 1, "r1">,  DwarfRegNum<[1]>;
def R2  : ARMReg< 2, "r2">,  DwarfRegNum<[2]>;
def R3  : ARMReg< 3, "r3">,  DwarfRegNum<[3]>;
def R4  : ARMReg< 4, "r4">,  DwarfRegNum<[4]>;
def R5  : ARMReg< 5, "r5">,  DwarfRegNum<[5]>;
def R6  : ARMReg< 6, "r6">,  DwarfRegNum<[6]>;
def R7  : ARMReg< 7, "r7">,  DwarfRegNum<[7]>;
----------------------------------------

I think it could be cleaner and shorter if we could write:
----------------------------------------
for i = sequence(0, 7) in
 def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>;
----------------------------------------
The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~ 7.

As a matter of fact, we have added "(sequence ...)" to TableGen that
makes defining register classes easier. I think add for-loop expansion
to TableGen would make defining registers easier.

====Example #2====

As you may see below, each "def load_*" is almost identical. The only
difference is the memory space name (global, constant, and etc.). I
believe a for-loop can make it much more readable.

(defining memory space patterns in PTXInstrInfo.td for each memory space)
----------------------------------------
def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{
 const Value *Src;
 const PointerType *PT;
 if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
     (PT = dyn_cast<PointerType>(Src->getType())))
   return PT->getAddressSpace() == PTX::GLOBAL;
 return false;
}]>;
def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{
 const Value *Src;
 const PointerType *PT;
 if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
     (PT = dyn_cast<PointerType>(Src->getType())))
   return PT->getAddressSpace() == PTX::CONSTANT;
 return false;
}]>;
def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{
 const Value *Src;
 const PointerType *PT;
 if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
     (PT = dyn_cast<PointerType>(Src->getType())))
   return PT->getAddressSpace() == PTX::LOCAL;
 return false;
}]>;
def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{
 const Value *Src;
 const PointerType *PT;
 if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
     (PT = dyn_cast<PointerType>(Src->getType())))
   return PT->getAddressSpace() == PTX::PARAMETER;
 return false;
}]>;
def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{
 const Value *Src;
 const PointerType *PT;
 if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
     (PT = dyn_cast<PointerType>(Src->getType())))
   return PT->getAddressSpace() == PTX::SHARED;
 return false;
}]>;
----------------------------------------

It would be much cleaner if we could write:
----------------------------------------
for space = [global, constant, local, parameter, shared] in {
 def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{
   const Value *Src;
   const PointerType *PT;
   if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
       (PT = dyn_cast<PointerType>(Src->getType())))
     return PT->getAddressSpace() == PTX::#upper(space);
   return false;
 }]>;
}
----------------------------------------

====Example #3====

The PTX backend makes excessive use of multiclass to "overload" and
instruction that supports different types of operands. Each "def" of
the multiclass is almost idential.

In this case, although using a foo-loop adds a little bit cognitive
cost to understand macro expansion, since it removes a lot of
redundancy, I think it is actually more readable.

(excerpt of PTXInstrInfo.td)
----------------------------------------
multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
  def rrr32 : InstPTX<(outs RegF32:$d),
                      (ins RegF32:$a, RegF32:$b, RegF32:$c),
                      !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
                      [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
                                                          RegF32:$b),
                                                 RegF32:$c))]>;
  def rri32 : InstPTX<(outs RegF32:$d),
                      (ins RegF32:$a, RegF32:$b, f32imm:$c),
                      !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
                      [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
                                                          RegF32:$b),
                                                 fpimm:$c))]>;
  def rrr64 : InstPTX<(outs RegF64:$d),
                      (ins RegF64:$a, RegF64:$b, RegF64:$c),
                      !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
                      [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
                                                          RegF64:$b),
                                                 RegF64:$c))]>;
  def rri64 : InstPTX<(outs RegF64:$d),
                      (ins RegF64:$a, RegF64:$b, f64imm:$c),
                      !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
                      [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
                                                          RegF64:$b),
                                                 fpimm:$c))]>;
}
----------------------------------------

(Equivalent TableGen code with a for-loop)
----------------------------------------
multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
  for nbit         = [32, 32, 64, 64],
      op_suffix    = [r, i, r, i],
      op_type      = [RegF32, f32imm, RegF64, f64imm],
      op_node_type = [RegF32, fpimm, RegF64, fpimm] in {
    def rr#op_suffix#nbit
      : InstPTX<(outs RegF#nbit:$d),
                (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c),
                !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"),
                [(set RegF#nbit:$d,
                   (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b),
                            #op_node_type:$c))]>;
  }
}
----------------------------------------

====

Any comments are welcome. If no one thinks it does not make sense at
all, I will write up more details to the mailing list (like a complete
BNF).

Regards,
Che-Liang