[LLVMdev] Transcoding UTF-8 to ASCII?
Chris Lattner
sabre at nondot.org
Fri Jan 16 22:12:00 PST 2004
> There's two approaches here:
> 1. just cast unsigned char to signed char and _hope_ it doesn't
> screw up LLVM
> 2. encode the extended characters into ASCII somewhat like what
> browsers do with URLs (e.g. space=%20).
>
> Would approach #1 work with LLVM?
Yes, we never 'interpret' the characters in an identifier for anything
that would cause us to consider the 'sign bit' special.
> Are there any character bit patterns forbidden in LLVM identifiers?
No. I believe everything should work, though you might get into trouble
if you use the '"' character (which can be easily fixed in the lexer and
asmwriter as needed.
> This would be the simplest to implement but I'm unsure of the
> consequences. If there is, I'll be forced into approach #2.
I believe this should work. LLVM identifiers should be completely
unconstrained.
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/
More information about the llvm-dev
mailing list