[cfe-commits] [patch] Fix fixme about duplicated utf conversion code in codegen and lex

Mon Jul 2 16:51:34 PDT 2012

Final ping. If I don't get a "please land" from a code owner, I'll let
this patch die. (Not a big deal either way.)

On Fri, Jun 29, 2012 at 9:56 AM, Nico Weber <thakis at chromium.org> wrote:
> All done (except for the static_cast, since that line also converts
> signedness).
>
> Any comments on the new file name / location from anyone?
>
>
> On Wed, Jun 27, 2012 at 7:06 AM, Hans Wennborg <hans at chromium.org> wrote:
>>
>> On Mon, Jun 25, 2012 at 4:54 AM, Nico Weber <thakis at chromium.org> wrote:
>> > Hi,
>> >
>> > the attached patch moves the duplicated code in CGExpr.cpp and
>> > LiteralSupport.cpp into a new file Basic/ConvertUTFWrapper.h/cpp.
>> > Please let me know what you think.
>> >
>> > Nico
>>
>> Looks reasonable as far as I can tell. Just some nits:
>>
>> > +#ifndef LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H
>> > +#define LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H
>> > +
>> > +#include "llvm/ADT/StringRef.h"
>> > +
>> > +namespace clang {
>> > +
>> > +/// Convert an UTF8 StringRef to UTF8, UTF16, or UTF32 depending on
>> > +/// WideCharWidth. On success, ResultPtr will point one after the end
>> > of the
>> > +/// copied string.
>> > +/// \returns true on success.
>> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef Source,
>> > +                       char*& ResultPtr);
>>
>> Should the *& be on ResultPtr here? (and in the .cpp file)
>> Should the comment say that it's copying into ResultPtr?
>>
>> > Index: lib/Basic/ConvertUTFWrapper.cpp
>> > ===================================================================
>> > --- lib/Basic/ConvertUTFWrapper.cpp   (revision 0)
>> > +++ lib/Basic/ConvertUTFWrapper.cpp   (revision 0)
>> > @@ -0,0 +1,55 @@
>> > +//===-- ConvertUTFWrapper.cpp - Wrap ConvertUTF.h with clang data types
>> > -----===
>> > +//
>> > +//                     The LLVM Compiler Infrastructure
>> > +//
>> > +// This file is distributed under the University of Illinois Open
>> > Source
>> > +// License. See LICENSE.TXT for details.
>> > +//
>> >
>> > +//===----------------------------------------------------------------------===//
>> > +
>> > +#include "clang/Basic/ConvertUTFWrapper.h"
>> > +#include "clang/Basic/ConvertUTF.h"
>> > +#include "clang/Basic/LLVM.h"
>> > +
>> > +namespace clang {
>> > +
>> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef Source,
>> > +                       char*& ResultPtr) {
>> > +  assert(WideCharWidth==1 || WideCharWidth==2 || WideCharWidth==4);
>> > +  ConversionResult result = conversionOK;
>> > +  // Copy the character span over.
>> > +  if (WideCharWidth == 1) {
>> > +    if (!isLegalUTF8String(reinterpret_cast<const
>> > UTF8*>(&*Source.begin()),
>> > +                           reinterpret_cast<const
>> > UTF8*>(&*Source.end())))
>>
>> The &* isn't really necessary for Source.begin() / end(). Do we
>> normally do this for StringRefs? (I see it wasn't there in
>> lib/Lex/LiteralSupport.cpp)
>>
>> > +      result = sourceIllegal;
>> > +    memcpy(ResultPtr, Source.data(), Source.size());
>> > +    ResultPtr += Source.size();
>> > +  } else if (WideCharWidth == 2) {
>> > +    UTF8 const *sourceStart = (UTF8 const *)Source.data();
>>
>> This is equivalent to "const UTF8 *sourceStart", right? At least I
>> find that easier to read. And maybe static_cast it?
>>
>> > +    // FIXME: Make the type of the result buffer correct instead of
>> > +    // using reinterpret_cast.
>> > +    UTF16 *targetStart = reinterpret_cast<UTF16*>(ResultPtr);
>> > +    ConversionFlags flags = strictConversion;
>> > +    result = ConvertUTF8toUTF16(
>> > +      &sourceStart,sourceStart + Source.size(),
>>
>> Nit: space between &targetStart and targetStart?
>>
>> > +        &targetStart,targetStart + 2*Source.size(),flags);
>>
>> And some more spaces here too..
>>
>> > +    if (result==conversionOK)
>>
>> And here :)
>>
>> > +      ResultPtr = reinterpret_cast<char*>(targetStart);
>> > +  } else if (WideCharWidth == 4) {
>> > +    UTF8 const *sourceStart = (UTF8 const *)Source.data();
>>
>> Ditto about "const UTF8* sourceStart" etc.
>>
>> > +    // FIXME: Make the type of the result buffer correct instead of
>> > +    // using reinterpret_cast.
>> > +    UTF32 *targetStart = reinterpret_cast<UTF32*>(ResultPtr);
>> > +    ConversionFlags flags = strictConversion;
>> > +    result = ConvertUTF8toUTF32(
>>
>> > +        &sourceStart,sourceStart + Source.size(),
>> > +        &targetStart,targetStart + 4*Source.size(),flags);
>> > +    if (result==conversionOK)
>>
>> Ditto about spaces.
>>
>> Thanks,
>> Hans
>
>