[LLVMdev] LLVM supports Unicode?

Joachim Durchholz jo at durchholz.org
Sun Aug 28 11:55:25 PDT 2011


Am 28.08.2011 20:02, schrieb geovanisouza92 at gmail.com:
> Hi, Jo!
>
> I'm trying create a new programming language, and I want that it have
> Unicode support (support for read and manipulate rightly the source-code and
> string literals).
>
> But, in addition, my programming language supports "string interpolation"
> string, and in these interpolations, tiny snippets of code, like
> expressions, or variable names.

As Reid said, this probably isn't the right list to ask questions about 
the runtime system.
Still, it's marginally relevant, and I happen to have done a bit with 
Unicode lately, so here goes:

In that case, you have a multitude of design and implementation choices. 
You won't be able to properly explore these until you have done some 
more reading.

I'd suggest reading the Unicode standard, available for free at 
http://unicode.org. You'll have to read the material there more than 
once, I fear; at least I had to before I was able to roughly determine 
which parts of the standard were relevant for what I wanted to do.
For starters, you'll want to know about the various encodings (UTF-8 and 
UTF-16 are the most relevant ones), and about surrogate pairs. With that 
in mind, you can start thinking about writing (or using) a library.

For practical usage, I have been sticking with the ICU library.
(Be warned that you still need to know a good deal about Unicode before 
you can properly determine what options of ICU actually do what you want.)

Hope this helps, and good luck!
Regards,
Jo



More information about the llvm-dev mailing list