<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 8/23/2018 3:27 PM, Marcus Johnson
      via cfe-dev wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAFWGNQXXL0o+3BxsUuKJ_WD8YbybUBOhEjE9D80aq_OsBSMVdA@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>
            <div style="font-size:large;display:inline"
              class="gmail_default">Thanks for the link to that thread
              Tim.</div>
          </blockquote>
          <div><br>
          </div>
          <div style="font-size:large" class="gmail_default">Eli: "<span
              style="font-size:small">I don't follow; can't you just
              convert the format string from UTF-16/UTF-32 to UTF-8
              before checking it?  (Granted, that's not particularly
              efficient, but it's rare enough that it probably doesn't
              matter.)</span>"</div>
          <div><br>
          </div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
            <div style="font-size:large;display:inline"
              class="gmail_default"> and I realized a bit after posting
              this that converting the format strings from UTF-16/wchar,
              to UTF-8 would probably be the best way to achieve this
              Eli.</div>
          </blockquote>
          <div><br>
          </div>
          <div style="font-size:large" class="gmail_default">I'm just
            not sure how I'd handle the type matching, do you know when
            that happens in comparison to when the string/character
            literals would be converted? would that get in the way, or
            get messed up?</div>
        </div>
      </div>
    </blockquote>
    <br>
    In the clang AST, a string literal is represented as an array of
    integers of the appropriate width; the lexer converts from UTF-8 to
    UTF-16 or UTF-32 at the same time it resolves escapes.  (This is
    necessary to compute the length of the string, which is part of the
    string literal's type.)<br>
    <br>
    You can check the width of the characters in a string using
    StringLiteral::getCharByteWidth().  It's 1, 2 or 4, depending on
    whether it's UTF-8, UTF-16, or UTF-32.  You can read individual
    characters from that array using StringLiteral::getCodeUnit().  Or
    you can grab the whole array using StringLiteral::getBytes() (note
    that the return type here is a bit misleading).<br>
    <br>
    Actually, you might not want to use a real UTF-16 to UTF-8
    conversion; maybe better to translate all non-ASCII bytes to 0xFF or
    something. Not that it really affects the parsing, but it probably
    makes translating back to a source location along the lines of
    StringLiteral::getLocationOfByte easier.<br>
    <br>
    -Eli<br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
  </body>
</html>