[PATCH] Add new ubegin/uend API to StringRef/MDString

Thu Nov 13 15:54:52 PST 2014

On Thu, Nov 13, 2014 at 1:59 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2014-Nov-13, at 13:54, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Nov 13, 2014 at 1:40 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >
> > > On 2014-Nov-13, at 13:36, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Nov 13, 2014 at 1:33 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > >
> > > > On 2014-Nov-13, at 13:31, David Blaikie <dblaikie at gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On Thu, Nov 13, 2014 at 12:59 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > > >
> > > > > On 2014-Nov-13, at 10:35, David Blaikie <dblaikie at gmail.com>
> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 13, 2014 at 4:39 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > > > >
> > > > > > On 2014 Nov 12, at 16:55, Keno Fischer <
> kfischer at college.harvard.edu> wrote:
> > > > > >
> > > > > > Hi dexonsmith,
> > > > > >
> > > > > > In some instances (such as in BitCodeWriter - see
> http://reviews.llvm.org/D6184),
> > > > > > it is useful to get an unsiged version of the data pointer
> instead. This adds an
> > > > > > appropriate API to be used in BitCodeWriter in a follow up
> commit.
> > > > > >
> > > > > > http://reviews.llvm.org/D6241
> > > > > >
> > > > > > Files:
> > > > > >  include/llvm/ADT/StringRef.h
> > > > > >  include/llvm/IR/Metadata.h
> > > > > > <D6241.16125.patch>
> > > > >
> > > > > This LGTM.  I have minor nitpicks below that you can take or leave.
> > > > >
> > > > > @David, does this API make sense to you given Keno's use case [1]?
> > > > >
> > > > > While it matches that use case quite well (needing iterator
> begin/end) there are many other cases where we play the same game - getting
> uint8_t rather than just unsigned char:
> > > > >
> > > > > lib/Support/DataExtractor.cpp:  return getU<uint8_t>(offset_ptr,
> this, IsLittleEndian, Data.data());
> > > > > lib/Support/MD5.cpp:  const uint8_t *Ptr = Data.data();
> > > > > lib/Support/MD5.cpp:  ArrayRef<uint8_t> SVal((const uint8_t
> *)Str.data(), Str.size());
> > > > > lib/IR/Constants.cpp:    const uint8_t *Data =
> reinterpret_cast<const uint8_t *>(Str.data());
> > > > > lib/ProfileData/CoverageMappingReader.cpp:  Result =
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > > > lib/ProfileData/CoverageMappingReader.cpp:
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > > > lib/ProfileData/CoverageMappingReader.cpp:
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > > > lib/ExecutionEngine/RuntimeDyld/RuntimeDyldChecker.cpp:
> reinterpret_cast<const uint8_t *>(SectionMem.data()),
> > > > > lib/ExecutionEngine/Interpreter/ExternalFunctions.cpp:  uint8_t
> *ArgDataPtr = ArgData.data();
> > > > > lib/DebugInfo/DWARFFormValue.cpp:      Value.data =
> reinterpret_cast<const uint8_t *>(str.data());
> > > > > include/llvm/Support/FileOutputBuffer.h:    return
> (uint8_t*)Region->data();
> > > > > include/llvm/Support/FileOutputBuffer.h:    return
> (uint8_t*)Region->data() + Region->size();
> > > > > include/llvm/Object/ELF.h:    return reinterpret_cast<const
> uint8_t *>(Buf.data());
> > > > > include/llvm/MC/YAML.h:      : Data(reinterpret_cast<const uint8_t
> *>(Data.data()), Data.size()),
> > > > > tools/llvm-objdump/llvm-objdump.cpp:    ArrayRef<uint8_t>
> Bytes(reinterpret_cast<const uint8_t *>(BytesStr.data()),
> > > > > tools/llvm-objdump/MachODump.cpp:    ArrayRef<uint8_t>
> Bytes(reinterpret_cast<const uint8_t *>(BytesStr.data()),
> > > > > tools/llvm-mc/Disassembler.cpp:  ArrayRef<uint8_t>
> Data(Bytes.first.data(), Bytes.first.size());
> > > > > tools/llvm-readobj/StreamWriter.h:    auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > > > tools/llvm-readobj/StreamWriter.h:    auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > > > tools/llvm-readobj/StreamWriter.h:    auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > > > tools/llvm-readobj/StreamWriter.h:    auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > > >
> > > > > In several of these cases we're interested in getting an
> ArrayRef<uint8_t> from the StringRef - so maybe that's a better primitive
> operation (it's annoying when interoperating with containers, I'll grant
> you - since you need to call it twice, or stash a local variable - but one
> day... we'll have range algorithms for this sort of thing - in the mean
> time we could have utility functions like those in this patch, but I think
> the primary operation should probably be "give me a raw (uint8_t) byte
> array for this StringRef")
> > > > >
> > > >
> > > > Sure.  Do you have a name in mind?  `uref()`?
> > > >
> > > >
> > > > If we're going with a C++-esque name: "ArrayRef<uint8_t> bytes()" ?
> > > >
> > >
> > > `bytes()` sounds like `char` to me.  `ubytes()`?
> > >
> > > Really? It seems like most places that want bytes deal with unsigned
> chars, and the existence of this method on a StringRef would imply that
> it's not just giving identical begin/end as the StringRef's own begin/end,
> I would imagine...
> > >
> > > But if you particularly prefer ubytes, sure.
> > >
> > > Then if someone needs a `signed char` they can add `sbytes()`.
> > >
> > > sbytes is just a range over char which StringRef already is,
> >
> > `char`, `unsigned char`, and `signed char` are three distinct types.
> > If someone depends on have signed chars, they need to cast to
> > `signed char`.
> >
> > It's implementation-defined whether `char` is signed, but it's a
> > distinct type from the other two in either case.
> >
> > Thanks for the refresher/reminder - I was a bit vague/incorrect there.
> >
> >
> > > I'm not sure we'd ever need this function (I think it'd be reasonable
> to have some conversion from StringRef to ArrayRef<char> if we ever need it
> - though I suspect we won't).
> >
> > I would want to call the `ArrayRef<char>` version `bytes()`.
> >
> > And here.
> >
> > s/char/signed char/ up there if it helps.
> >
> > I'll try to resummarize: I don't think anyone ever really wants signed
> char in this context, either they don't care (and StringRef/ArrayRef<char>
> is sufficient - and /maybe/ one day we'll want to ensure an easy conversion
> between the two, but I don't think I've seen a use case yet & have some
> doubts about one actually coming into existence) or they want unsigned,
> which is why I think "bytes()" is fairly unambiguous - the only time you
> want anything other than the range of char is when you want a range of
> (unsigned) bytes.
>
> Okay, that's convincing enough for me.
>
> @Keno, if you add an `ArrayRef<uint8_t> bytes() const` accessor and
> define `ubegin()` and `uend()` in terms of it,

For consistency with other range-y accessors, this might be
bytes_begin/bytes_end - or the caller can just use bytes().begin(),
bytes().end().

But I don't object to ubegin/uend either, if that's what you guys prefer.

> I think you're good
> to commit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141113/ef0c4583/attachment.html>