[LLVMdev] Transcoding UTF-8 to ASCII?

Chris Lattner sabre at nondot.org
Fri Jan 16 22:12:00 PST 2004


> There's two approaches here:
>      1. just cast unsigned char to signed char and _hope_ it doesn't
>         screw up LLVM
>      2. encode the extended characters into ASCII somewhat like what
>         browsers do with URLs (e.g. space=%20).
>
> Would approach #1 work with LLVM?

Yes, we never 'interpret' the characters in an identifier for anything
that would cause us to consider the 'sign bit' special.

> Are there any character bit patterns forbidden in LLVM identifiers?

No.  I believe everything should work, though you might get into trouble
if you use the '"' character (which can be easily fixed in the lexer and
asmwriter as needed.

> This would be the simplest to implement but I'm unsure of the
> consequences.  If there is, I'll be forced into approach #2.

I believe this should work.  LLVM identifiers should be completely
unconstrained.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/




More information about the llvm-dev mailing list