[llvm-dev] Email list just for front end developers?

Thu May 11 19:48:13 PDT 2017

So in writing this, I want to emphasize, in advance, that I'm grateful
to the LLVM team for having created LLVM, I do not feel I am in any
way entitled to anyone else's help, etc. The purpose of my original
post was just to suggest that there be a way for people who are
struggling with some parts of the APIs to find each other and answer
each other's questions. If any of the following sounds like I'm
demanding anything, that's not my intent. This may be longer than it
should be, too.

On Thu, 11 May 2017 15:51:14 -0700 Sean Silva <chisophugis at gmail.com>
wrote:
> I'm still unclear about what sorts of questions/discussion this
> audience is interested in. Can you provide a few examples of
> questions that might be asked, or links to discussions/blog posts
> on the net that exemplify the type of thing that you would want to
> happen in the "frontend authors using LLVM" medium you are thinking
> of?
> 
> For example, looking at the post you linked (
> http://lists.llvm.org/pipermail/llvm-dev/2017-March/111526.html the
> author of which is in this thread I think, hi!), there's not much
> LLVM-specific about it. In order to generate LLVM IR, it is
> necessary to understand its semantics. The semantics are mostly
> C-like, and so the answer to questions like those in the post:
> 
> - How to assign a text to a string variable?
> - How to access for example the 4th element in a heap-allocated
> array?
> - How to create local variables which are not part of the global
> scope?
> - How to create complex structures like classes and objects?
> 
> is basically "how would you do it in C"?

I think you're missing the import of his questions. This isn't "how
you would do it in C". He's asking effectively how to use the
IRBuilder APIs to do these things, not what in principle you want the
machine to do or what you might write into a file of text form IR if
you were bypassing the IRBuilder.

The "how do I construct a string constant in the API" one cost me
hours to figure out until someone on IRC took pity on me, which was
unusual, and steered me to the right call (CreateGlobalString).

Now I know what you're thinking. "Why didn't you just look for
something like CreateGlobalString to begin with?" And the answer is,
because I had no idea what I was looking for, what form it would take,
or what it was called, and googling at random wasn't helping. I knew
what sort of IR output I wanted but I didn't know how to generate it
in the API.

And yes, a naive newcomer can literally spend hours trying to figure
out that just one call they wanted was CreateGlobalString. Maybe I
missed the part of the extant documentation or the tutorial that
pointed there, but that's kind of the point, isn't it? If it was
there, I didn't find it, and I needed to ask for help.

It would have been nice to have a place where I could have
dropped the question and no one would have thought twice about just
answering.

And thus, my desire for a place for mutual aid for people who are
figuring out front end stuff or who have already figured it out and
are willing to help others.

> For example, to assign text to a "string variable", you first have
> to have a struct that represents you string type, then you do some
> C-level code to manipulate that data structure.

That's not what he's asking, or at least I don't think he is. He's
asking about how to generate the right IR. He probably was looking for
CreateGlobalString and how to put it together with other stuff.  His
other messages made that seem clear to someone else who had been
struggling recently with the same issues,.

> For some of the specifics of LLVM IR's SSA form (like phi nodes)
> working through the Kaleidoscope tutorial should give a feel for
> how to do it.

A feel isn't enough to write code that compiles and does the right
stuff. It gives you a tantalizing hint of what you want, but you need
more than that. The tutorial makes it all look really easy, but it
skips even a lot of details about what it is doing, and then you start
to ask questions that go slightly past the edge of the tutorial and
the answers _really_ aren't obvious.

Here's just one a of a million examples: in a real compiler you don't
want to just parse a number into a machine int or double or what have
you, you want to use arbitrary precision stuff that's machine
independent, and of course, LLVM has that because it needs it.

Unluckily, even once you know it is there, it takes a while to figure
out what you need to use it, and there's very little documentation at
all. There was a point where I wanted to just punt and let the APFloat
API parse a string from my front end's scanner (after all, why
duplicate that code and less well?), only there's no documentation on
what sorts of float formats it parses. If someone could have answered
that, or if there had been a paragraph or two in the doxygen output,
it would have saved me lots of time reading code. As it was, the best
someone had to suggest to me was read the tests, so I went off and
slogged through. (And yah, it turns out that it handles a variety of
standard C-like formats including hex floats, which was something I
wanted to figure out but couldn't just read in the doxygen docs
because it isn't there.)

So yah, you get a feel for what you want from the tutorial, but a feel
isn't enough to write a compiler for something non-trivial. The
programmer's guide helps a bit more, but you're quickly off the road
and feeling stuff out.

Note again that I'm not in any way suggesting that anyone should feel
obligated to help others. No one is! I have no claim on anyone else's
time, and again, thank all of you for creating LLVM in the first
place. All I was asking for was just a place for those who are willing
to help each other over the hump to contact each other. Puzzling
things out by reading the test cases for a piece of compiler
middle-end machinery isn't the most fun way to learn an API, and it is
slow, too.

> However, from your other posts, you say something like:
> "People working on front ends typically don't really know everything
> about the innards of LLVM, and thanks to LLVM's very nicely designed
> architecture, we don't need to, we mostly need to know how to
> generate IR and hook up to the rest of the system."
> 
> So it sounds like you're more interested in API usage like
> IRBuilder, setting up codegen, setting up MC, configuring pass
> pipelines, etc.? I.e. you have a clear idea about the "big picture"
> and what you are trying to do, but want to know how that is done in
> LLVM (e.g. "how do I create a phi node?"). This is a very different
> kind of question from the ones above.

The questions that guy was asking seem to be part and parcel, though I
could be wrong.

There's a big IR Builder API, the C bindings for all this stuff are
poorly documented, the C++ bindings are only slightly better
documented, and dealing with the important bits is not trivial. The
language reference superficially looks quite nice, but then you
realize the only way to actually figure out what IR you need for many
constructs is to write C code and get Clang to dump out the IR that
results. (It took me a while to learn that trick, even though, yes, it
should have been obvious, just like finding CreateGlobalString should
have been obvious only I didn't know I was looking for it.)

However, even once having seen what Clang generates, you still need to
figure out what to put into IRBuilder to get the same results. Sure,
that's probably somewhere in the innards of Clang, but try finding
it. It's not trivial, and without much documentation it takes a while
to figure out. I really would have like to have just asked someone
about a bunch of stuff instead of deciphering things.

Also take his second question about accessing parts of
arrays. getelementptr is sufficiently weird that there actually _is_ a
decent bit of documentation on it, but it isn't obvious that you're
looking for that unless someone tells you. (Luckily I stumbled on it.)

I get that this all seems really easy if you know it already. To
someone who is coming at this completely naively, without much
understanding of where to dig in, there's a very, very steep learning
curve and little documentation once you're done with the tutorial. A
lot has to be found in blog postings, bits of sample code you happen
to find, reading the implementation, etc.

And I note that even though I'm still at it, I don't yet have my front
end doing very much. There's still a lot to learn. And yah, maybe I'm
not the best at this. Maybe if I was smarter I'd just pick it up from
reading Clang code or something, but I'm slow and middle aged and yet
I have a job to get done anyway. So I keep beating my head against
stuff until I figure it out. But I still wish there were people to ask
questions.

> I'd really like to hear your feedback. I think that historically,
> Clang has been the dominant frontend and so there hasn't been much
> impetus for providing certain kinds of documention, but as you
> mention upthread, there is now a quite large tail of other
> frontends that are developed outside of the LLVM community, so that
> might have to change.

If I had a magic wand, I'd have everything well documented, but that
seems like it isn't going to happen. As a more practical second best,
all I really want is to be in contact with other people who work on
front ends who are sympathetic and remember learning the same thing.

Perry
-- 
Perry E. Metzger		perry at piermont.com