[LLVMdev] Transcoding UTF-8 to ASCII?
reid at x10sys.com
Fri Jan 16 19:20:02 PST 2004
Soliciting your help on the following question ...
XPL uses the UTF-8 encoding for its identifiers. As such it supports
Unicode and many non-ASCII characters.
LLVM uses std::string for identifiers which is based on a signed
character which only supports 7-bit ASCII.
Although the size of the characters in both schemes is the same, the bit
encoding is different (UTF-8 is unsigned, ASCII is 7-bit so uses a
signed char). I need to support UTF-8 in XPL and am wondering about
transcoding my identifiers for use by LLVM.
There's two approaches here:
1. just cast unsigned char to signed char and _hope_ it doesn't
screw up LLVM
2. encode the extended characters into ASCII somewhat like what
browsers do with URLs (e.g. space=%20).
Would approach #1 work with LLVM? Are there any character bit patterns
forbidden in LLVM identifiers? This would be the simplest to implement
but I'm unsure of the consequences. If there is, I'll be forced into
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 189 bytes
Desc: This is a digitally signed message part
More information about the llvm-dev