[LLVMdev] Transcoding UTF-8 to ASCII?

Reid Spencer reid at x10sys.com
Fri Jan 16 19:20:02 PST 2004


Soliciting your help on the following question ...

XPL uses the UTF-8 encoding for its identifiers. As such it supports
Unicode and many non-ASCII characters. 

LLVM uses std::string for identifiers which is based on a signed
character which only supports 7-bit ASCII. 

Although the size of the characters in both schemes is the same, the bit
encoding is different (UTF-8 is unsigned, ASCII is 7-bit so uses a
signed char).  I need to support UTF-8 in XPL and am wondering about
transcoding my identifiers for use by LLVM.  

There's two approaches here:
     1. just cast unsigned char to signed char and _hope_ it doesn't
        screw up LLVM
     2. encode the extended characters into ASCII somewhat like what
        browsers do with URLs (e.g. space=%20). 

Would approach #1 work with LLVM? Are there any character bit patterns
forbidden in LLVM identifiers? This would be the simplest to implement
but I'm unsure of the consequences.  If there is, I'll be forced into
approach #2.

Thanks,

Reid.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040116/38939d1f/attachment.sig>


More information about the llvm-dev mailing list