[LLVMdev] Enhancing TableGen

Sat Oct 8 06:08:02 PDT 2011

On Fri, Oct 7, 2011 at 11:05 PM, David A. Greene <greened at obbligato.org> wrote:
> Che-Liang Chiou <clchiou at gmail.com> writes:
>
>> My purpose is to eliminate copy-paste style of programming in td files
>> as much as possible, but only to a point that the new language
>> constructs do not create too much overhead/readability-downgrade.
>
> Yes!
>
>> In other words, I am targeting those low-hanging fruit of copy-paste
>> programmings in td files that are eliminated by a simple for-loop
>> syntax. The repetitive patterns I observed in PTX backend (and
>> probably some other backends) are:
>> * Members of a multiclass, such as those in PTXInstrInfo.td.
>> * Consecutive similar defs, such as register declarations.
>
> Yep.
>
>> [Why for-loop?]
>> * All I need is a simple iteration language construct. I believe there
>> is no simpler constructs (in terms of readability) than a for-loop,
>> but I am happy to be convinced otherwise.
>> * It is sufficient to eliminate the most common copy-paste programming
>> that I observed.
>> * It is simple enough to understand and maintain, at least I believe so.
>
> I mostly agree with these.  One other thing I've found useful is the
> ability to abstract out the type information.  For example, in x86 land,
> an SSE/AVX add is always:
>
> (set (type regclass:reg), (type (add (type regclass:reg), (type regclass:reg))))
>
> Similarly, a sub is:
>
> (set (type regclass:reg), (type (sub (type regclass:reg), (type regclass:reg))))
>
> In fact most binary operations are:
>
> (set (type regclass:reg), (type (op (type regclass:reg), (type regclass:reg))))
>
> So why write hundreds of patterns to express this?  Using the for-loop
> syntax:
>
> // WARNING: Pseudo-code, many details elided for presentation purposes.
>
> multiclass binop<opcode> : sse_binop<opcode>, avx_binop<opcode>;
>
> multiclass sse_binop<opcode> {
>  for type = [f32, f64, v4f32, v2f64]
>      regclass = [FP32, FP64, VR128, VR128]
>      suffix = [ss, sd, ps, pd] {
>
>    def !toupper(suffix)#rr : Instr<
>      [(set (type regclass:$dst), (type (opcode (type regclass:$src1),
>                                                (type regclass:$src2))))]>;
>    def !toupper(suffix)#rm : Instr<
>      [(set (type regclass:$dst), (type (opcode (type regclass:$src1),
>                                                (type addr:$src2))))]>;
>  }
> }
>
> multiclass avx_binop<opcode> {
>  for type = [f32, f64, v4f32, v2f64, v8f32, v4f64]
>      regclass = [FP32, FP64, VR128, VR128, VR256, VR256]
>      prefix = [x, x, x, x, y, y]
>      suffix = [ss, sd, ps, pd] {
>
>    def V#prefix#NAME#!toupper(suffix)#rr : Instr<
>      [(set (type regclass:$dst), (type (opcode (type regclass:$src1),
>                                                (type regclass:$src2))))]>;
>    def V#prefix#NAME#!toupper(suffix)#rm : Instr<
>      [(set (type regclass:$dst), (type (opcode (type regclass:$src1),
>                                                (type addr:$src2))))]>;
>  }
> }
>
> def ADD : binop<add>;
> def SUB : binop<add>;
> def MUL : binop<add>;
> def DIV : binop<add>;
> [...]
>
> Here I am treating "#" as an infix !strconcat.  This makes things much
> easier to read than both !strconcat() and a double-# notation, IMHO.
>
> Now each binary pattern is only specified twice and even that
> duplication can be eliminated with a little more abstraction.  Perhaps
> that's not worth it, however.  I can live with this level of
> duplication.
>
> I would also like to replace #NAME# with a "real" Record field that is
> automatically added to def-type Records.  This removes a lot of the
> hackiness of #NAME#.
>

I think the infix '#' and the pseudo codes you write are cleaner than
my design. Nice!

And I really wish that we could write for-loop inside a multiclass
definition (like in you pseudo example). I hope Jakob would agree so.
To me, it does not move any information outside the multiclass
definition rather than condensing that information. So I believe this
does not add extra looking-up efforts to us when reading td files.

>> [Why preprocessor?]
>>
>> The TGParser.cpp as its current form parses and emits the results in
>> one-pass. That means it would emit the for-loop body even before we
>> are done parsing the entire for-loop.
>
> It doesn't have to.  I'm working on a for loop that is not preprocessor
> based.  It uses the same technique as the multiclass: remember the most
> inner for loop seen and instantiate everything after the entire for loop
> body is parsed.
>

Agree! It is better if we could live without a preprocessor.

>> So I believe a non-preprocessor approach would require 2 passes. The
>> first pass parses the input and generates a simple syntax tree, and
>> the second pass evaluate the syntax tree and emits output records (In
>> fact, this is how I implemented the current preprocessor). And I
>> believe that changing TGParser.cpp to accommodate 2 passes is quite a
>> lot, and so I chose a preprocessor.
>
> In the long run, a two-pass parser would probably be more maintainable
> but I agree it's a big job.  That's why I went with a compromise of
> treating for like a sort of multiclass, at least in the instantiation
> mechanics.  I'm still working on it so I haven't got all the details
> together yet.
>
>> But if you think we should really rewrite TGParser.cpp to parse and
>> evaluate for-loops correctly, I am glad that we could get away with a
>> preprocessor.
>
> I think we can do it without completely rewriting the parser.
>
>> [Why NOT while-loop?]
>> * A while-loop requires evaluating an loop-condition expression; this
>> is complexity that I would like to avoid.
>
> I agree.  It's not needed.  Simple iteration over a list is enoguh.
>
>> [Why NO if-else?]
>> * It requires at least evaluating a Boolean expression, too.
>> * If a piece of td codes is complicated enough that we need an if-else
>> to eliminate its duplication, I think it is worthy of the duplication.
>
> Perhaps.  I'm sort of on the fence on this.  I don't like the !if stuff
> I introduced for readability reasons.  An imperitive-style if might make
> things easier but I think I could live with the duplication.
>
>> [Why NO abstractions (like `define foo(a, b, c)`)?]
>> * Abstractions is probably worthy of, but I am not sure yet. I think
>> we could wait until it is clear that we really need abstractions.
>
> I'm not sure what you mean by this abstraction.  Can you elaborate?
>

It is like C preprocessor style of #define foo(). You may define a
function-like macro (creating abstraction of macros). I guess we will
not like this idea since it adds extra looking-up efforts when we read
a td file (we have to look up a macro function's definition as well as
its instaniations).

>> [string vs token]
>>
>> The preprocessor (as its current form) has tokens by default, and it
>> only converts a series of tokens and white spaces into a string if
>> explicitly required (by a special escape #"#, see example below).
>
> On further thought, I think this is correct.  My plan with the
> parser-integrated for is to allow the user to declare the type of the
> iterator.  It solves a number of problems, most importantly TableGen's
> declaration before use requirement.
>

Sounds good to me.

>> ----------------------------------------
>> #for i = sequence(0, 127)
>> def P#i# : PTXReg<#"#p#i##"#>;
>> #end
>> ----------------------------------------
>> * Anything between #"# are quoted, including white spaces and
>> non-tokens. E.g., #"#hello  world#"# --> "hello  world"
>
> As mentioned above, I've been thinking about making "#" an infix
> equivalent to !strconcat().  This would require the parser to implicitly
> cast values to string if necessary but that's not hard to do.
>

I like the infix '#', too. It is more readable than the alternatives
we have so far.

>> * Macro variable needs a '#' character at both front and back. This
>> looks like the multiclass's #NAME# substitution, and so I think is
>> more consistent than prepending a single '#' at front.
>
> To handle existing #NAME# constructs, I was planing to define a trailing
> # as a !strconcat(string, "").  This keeps consistency and provides more
> flexibility to the # operator.
>
>> What do you think? Which one is more readable to you? !case<> or #"# or ... ?
>
> I like an infix #.  It's consistent with at least some other languages.
>
>> [Can the for-loop proopsal be a preprocessing phase?]
>>
>> I guess the example Dave gave (see below) cannot be handled in a (even
>> extended) preprocessor. I am not keen on implementing for-loop in a
>> preprocessor. I chose a preprocessor because I think it would cause
>> least impact to the codebase and, to be honest, I didn't address of
>> the pattern that Dave gave in his example in my design. I was trying
>> to avoid variable-length lists because I think that is too complicated
>> to users. But I could be wrong.
>
> I think it could be if not done carefully.  I am already getting
> feedback that some of my proposals are too complicated.  I am really
> taking that to heart and trying to find a good balance among
> maintainability, clarity and redundancy.  I think things can be better
> than they are now and better than what I've proposed so far.  No one
> gets the right answer in a vacuum.  :)
>
>                              -Dave
>

Regards,
Che-Liang