Sorry for the silence...<div><br></div><div>I think this is a perfectly fine direction.</div><div><br></div><div>I would suggest a few things:</div><div><br></div><div>1) This should be in 'include/llvm/Support/UTFUtils.h' or some such.</div>
<div><br></div><div>2) I would personally like a bit more type safety involved. Why not output a wchar_t array? I might suggest the following:</div><div><br></div><div>2a) Make this a template on a type T which only has specializations for char16_t and char32_t (or LLVM-provided analogs to those types in C++98)</div>
<div>2b) Make a wchar_t variant that delegates to the appropriately sized one.</div><div>2c) Make the output buffer strongly typed, potentially SmallVectorImpl<T>&</div><div><br></div><div>3) Mail the updated patch to llvm-commits as well to make sure folks are OK with it going into the common LLVM libraries.</div>
<div><br></div><div>But I dunno if all of that is worthwhile... The unsigned size selector just seems brittle and gross to me.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jul 2, 2012 at 4:51 PM, Nico Weber <span dir="ltr"><<a href="mailto:thakis@chromium.org" target="_blank">thakis@chromium.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Final ping. If I don't get a "please land" from a code owner, I'll let<br>
this patch die. (Not a big deal either way.)<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Jun 29, 2012 at 9:56 AM, Nico Weber <<a href="mailto:thakis@chromium.org">thakis@chromium.org</a>> wrote:<br>
> All done (except for the static_cast, since that line also converts<br>
> signedness).<br>
><br>
> Any comments on the new file name / location from anyone?<br>
><br>
><br>
> On Wed, Jun 27, 2012 at 7:06 AM, Hans Wennborg <<a href="mailto:hans@chromium.org">hans@chromium.org</a>> wrote:<br>
>><br>
>> On Mon, Jun 25, 2012 at 4:54 AM, Nico Weber <<a href="mailto:thakis@chromium.org">thakis@chromium.org</a>> wrote:<br>
>> > Hi,<br>
>> ><br>
>> > the attached patch moves the duplicated code in CGExpr.cpp and<br>
>> > LiteralSupport.cpp into a new file Basic/ConvertUTFWrapper.h/cpp.<br>
>> > Please let me know what you think.<br>
>> ><br>
>> > Nico<br>
>><br>
>> Looks reasonable as far as I can tell. Just some nits:<br>
>><br>
>> > +#ifndef LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H<br>
>> > +#define LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H<br>
>> > +<br>
>> > +#include "llvm/ADT/StringRef.h"<br>
>> > +<br>
>> > +namespace clang {<br>
>> > +<br>
>> > +/// Convert an UTF8 StringRef to UTF8, UTF16, or UTF32 depending on<br>
>> > +/// WideCharWidth. On success, ResultPtr will point one after the end<br>
>> > of the<br>
>> > +/// copied string.<br>
>> > +/// \returns true on success.<br>
>> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef Source,<br>
>> > + char*& ResultPtr);<br>
>><br>
>> Should the *& be on ResultPtr here? (and in the .cpp file)<br>
>> Should the comment say that it's copying into ResultPtr?<br>
>><br>
>> > Index: lib/Basic/ConvertUTFWrapper.cpp<br>
>> > ===================================================================<br>
>> > --- lib/Basic/ConvertUTFWrapper.cpp (revision 0)<br>
>> > +++ lib/Basic/ConvertUTFWrapper.cpp (revision 0)<br>
>> > @@ -0,0 +1,55 @@<br>
>> > +//===-- ConvertUTFWrapper.cpp - Wrap ConvertUTF.h with clang data types<br>
>> > -----===<br>
>> > +//<br>
>> > +// The LLVM Compiler Infrastructure<br>
>> > +//<br>
>> > +// This file is distributed under the University of Illinois Open<br>
>> > Source<br>
>> > +// License. See LICENSE.TXT for details.<br>
>> > +//<br>
>> ><br>
>> > +//===----------------------------------------------------------------------===//<br>
>> > +<br>
>> > +#include "clang/Basic/ConvertUTFWrapper.h"<br>
>> > +#include "clang/Basic/ConvertUTF.h"<br>
>> > +#include "clang/Basic/LLVM.h"<br>
>> > +<br>
>> > +namespace clang {<br>
>> > +<br>
>> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef Source,<br>
>> > + char*& ResultPtr) {<br>
>> > + assert(WideCharWidth==1 || WideCharWidth==2 || WideCharWidth==4);<br>
>> > + ConversionResult result = conversionOK;<br>
>> > + // Copy the character span over.<br>
>> > + if (WideCharWidth == 1) {<br>
>> > + if (!isLegalUTF8String(reinterpret_cast<const<br>
>> > UTF8*>(&*Source.begin()),<br>
>> > + reinterpret_cast<const<br>
>> > UTF8*>(&*Source.end())))<br>
>><br>
>> The &* isn't really necessary for Source.begin() / end(). Do we<br>
>> normally do this for StringRefs? (I see it wasn't there in<br>
>> lib/Lex/LiteralSupport.cpp)<br>
>><br>
>> > + result = sourceIllegal;<br>
>> > + memcpy(ResultPtr, Source.data(), Source.size());<br>
>> > + ResultPtr += Source.size();<br>
>> > + } else if (WideCharWidth == 2) {<br>
>> > + UTF8 const *sourceStart = (UTF8 const *)Source.data();<br>
>><br>
>> This is equivalent to "const UTF8 *sourceStart", right? At least I<br>
>> find that easier to read. And maybe static_cast it?<br>
>><br>
>> > + // FIXME: Make the type of the result buffer correct instead of<br>
>> > + // using reinterpret_cast.<br>
>> > + UTF16 *targetStart = reinterpret_cast<UTF16*>(ResultPtr);<br>
>> > + ConversionFlags flags = strictConversion;<br>
>> > + result = ConvertUTF8toUTF16(<br>
>> > + &sourceStart,sourceStart + Source.size(),<br>
>><br>
>> Nit: space between &targetStart and targetStart?<br>
>><br>
>> > + &targetStart,targetStart + 2*Source.size(),flags);<br>
>><br>
>> And some more spaces here too..<br>
>><br>
>> > + if (result==conversionOK)<br>
>><br>
>> And here :)<br>
>><br>
>> > + ResultPtr = reinterpret_cast<char*>(targetStart);<br>
>> > + } else if (WideCharWidth == 4) {<br>
>> > + UTF8 const *sourceStart = (UTF8 const *)Source.data();<br>
>><br>
>> Ditto about "const UTF8* sourceStart" etc.<br>
>><br>
>> > + // FIXME: Make the type of the result buffer correct instead of<br>
>> > + // using reinterpret_cast.<br>
>> > + UTF32 *targetStart = reinterpret_cast<UTF32*>(ResultPtr);<br>
>> > + ConversionFlags flags = strictConversion;<br>
>> > + result = ConvertUTF8toUTF32(<br>
>><br>
>> > + &sourceStart,sourceStart + Source.size(),<br>
>> > + &targetStart,targetStart + 4*Source.size(),flags);<br>
>> > + if (result==conversionOK)<br>
>><br>
>> Ditto about spaces.<br>
>><br>
>> Thanks,<br>
>> Hans<br>
><br>
><br>
<br>
</div></div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
cfe-commits mailing list<br>
<a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>
</div></div></blockquote></div><br></div>