[LLVMdev] Enhancing TableGen

Thu Oct 6 22:00:29 PDT 2011

My purpose is to eliminate copy-paste style of programming in td files
as much as possible, but only to a point that the new language
constructs do not create too much overhead/readability-downgrade.

In other words, I am targeting those low-hanging fruit of copy-paste
programmings in td files that are eliminated by a simple for-loop
syntax. The repetitive patterns I observed in PTX backend (and
probably some other backends) are:
* Members of a multiclass, such as those in PTXInstrInfo.td.
* Consecutive similar defs, such as register declarations.

[Why for-loop?]
* All I need is a simple iteration language construct. I believe there
is no simpler constructs (in terms of readability) than a for-loop,
but I am happy to be convinced otherwise.
* It is sufficient to eliminate the most common copy-paste programming
that I observed.
* It is simple enough to understand and maintain, at least I believe so.

[Why preprocessor?]

I admit that a preprocessor is probably not the best solution. And we
can implement for-loop without a preprocessor. The only reason I chose
a preprocessor is because (I believe) this would add least changes to
the TGParser.cpp.

The TGParser.cpp as its current form parses and emits the results in
one-pass. That means it would emit the for-loop body even before we
are done parsing the entire for-loop.

So I believe a non-preprocessor approach would require 2 passes. The
first pass parses the input and generates a simple syntax tree, and
the second pass evaluate the syntax tree and emits output records (In
fact, this is how I implemented the current preprocessor). And I
believe that changing TGParser.cpp to accommodate 2 passes is quite a
lot, and so I chose a preprocessor.

But if you think we should really rewrite TGParser.cpp to parse and
evaluate for-loops correctly, I am glad that we could get away with a
preprocessor.

[Why NOT while-loop?]
* A while-loop requires evaluating an loop-condition expression; this
is complexity that I would like to avoid.

[Why NO if-else?]
* It requires at least evaluating a Boolean expression, too.
* If a piece of td codes is complicated enough that we need an if-else
to eliminate its duplication, I think it is worthy of the duplication.

[Why NO abstractions (like `define foo(a, b, c)`)?]
* Abstractions is probably worthy of, but I am not sure yet. I think
we could wait until it is clear that we really need abstractions.

Hi Dave and Jakob,

Thanks for comments. I try my best to respond any comments you wrote
about. If I missed any comments, as they are really a lot, please let
me know.

[string vs token]

The preprocessor (as its current form) has tokens by default, and it
only converts a series of tokens and white spaces into a string if
explicitly required (by a special escape #"#, see example below).
----------------------------------------
#for i = sequence(0, 127)
def P#i# : PTXReg<#"#p#i##"#>;
#end
----------------------------------------
* Anything between #"# are quoted, including white spaces and
non-tokens. E.g., #"#hello  world#"# --> "hello  world"
* Macro variable needs a '#' character at both front and back. This
looks like the multiclass's #NAME# substitution, and so I think is
more consistent than prepending a single '#' at front.
* So my current idea is very similar to Dave's, except that I replace
string with tokens (i.e., having both iterators as tokens and paste
"operator" results as tokens).

What do you think? Which one is more readable to you? !case<> or #"# or ... ?

[Can the for-loop proopsal be a preprocessing phase?]

I guess the example Dave gave (see below) cannot be handled in a (even
extended) preprocessor. I am not keen on implementing for-loop in a
preprocessor. I chose a preprocessor because I think it would cause
least impact to the codebase and, to be honest, I didn't address of
the pattern that Dave gave in his example in my design. I was trying
to avoid variable-length lists because I think that is too complicated
to users. But I could be wrong.
----------------------------------------
multiclass blah<list<int> Values> {
 for v = Values {
   def DEF#v : base_class<v>;
 }
}
----------------------------------------

No preprocessor seems to have another syntactical benefits --- we
could remove extra '#' characters. To be honest, those '#' are not
very nice looking. And Dave's example looks cleaner than my excess-'#'
style.

Hi Dave,

I am not sure what you want to play around with, but you are not
disrupting anything so far.

Regards,
Che-Liang

On Fri, Oct 7, 2011 at 5:37 AM, Jakob Stoklund Olesen <jolesen at apple.com> wrote:
>
> On Oct 6, 2011, at 12:42 PM, David A. Greene wrote:
>
>> Jakob Stoklund Olesen <jolesen at apple.com> writes:
>>
>>> On Oct 6, 2011, at 7:59 AM, David A. Greene wrote:
>>>
>>>> For example, I want to be able to do this:
>>>>
>>>> defm MOVH :
>>>> vs1x_fps_binary_vv_node_rmonly<
>>>>   0x16, "movh", undef, 0,
>>>>          // rr
>>>>          [(undef)],
>>>>          // rm
>>>>          [(set DSTREGCLASS:$dst,
>>>>                (DSTTYPE (movlhps SRCREGCLASS:$src1,
>>>>                                (DSTTYPE (bitconvert
>>>>                                            (v2f64 (scalar_to_vector
>>>>                                                      (loadf64 addr:$src2))))))))],
>>>>          // rr Pat
>>>>          [],
>>>>          // rm Pat
>>>>          [[(DSTTYPE (movlhps SRCREGCLASS:$src1, (load addr:$src2))),
>>>>            (MNEMONIC SRCREGCLASS:$src1, addr:$src2)],
>>>>           [(INTDSTTYPE (movlhps SRCREGCLASS:$src1, (load addr:$src2))),
>>>>            (MNEMONIC SRCREGCLASS:$src1, addr:$src2)]]>;
>>>
>>> This kind of thing is very hard to read and understand.
>>
>> What's hard about it?  I'm not trying to be agitational here.  I'm truly
>> wondering what I can do to make this more understandable.
>
> If you didn't write these patterns yourself, or if you wrote them six months ago, it is nearly impossible to figure out where a specific pattern came from, or where a specific instruction is defined.
>
> It is hard enough mentally executing the current multiclasses.  Injecting patterns into multi defms like this makes it much harder still.
>
>>  I am certainly happy to make things more readable and welcome
>> lots of feedback in that area.  But the ability to quickly and easily
>> extend the ISA for new vector lengths is critical to us.
>
> This is where our priorities differ.
>
> Readability and maintainability are key.
>
> After all, we need to fix isel and codegen bugs more often than Intel and AMD add ISA extensions.
>
> /jakob
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>