[PATCH] Add new ubegin/uend API to StringRef/MDString
David Blaikie
dblaikie at gmail.com
Thu Nov 13 13:36:58 PST 2014
On Thu, Nov 13, 2014 at 1:33 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2014-Nov-13, at 13:31, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Nov 13, 2014 at 12:59 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >
> > > On 2014-Nov-13, at 10:35, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Nov 13, 2014 at 4:39 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > >
> > > > On 2014 Nov 12, at 16:55, Keno Fischer <kfischer at college.harvard.edu>
> wrote:
> > > >
> > > > Hi dexonsmith,
> > > >
> > > > In some instances (such as in BitCodeWriter - see
> http://reviews.llvm.org/D6184),
> > > > it is useful to get an unsiged version of the data pointer instead.
> This adds an
> > > > appropriate API to be used in BitCodeWriter in a follow up commit.
> > > >
> > > > http://reviews.llvm.org/D6241
> > > >
> > > > Files:
> > > > include/llvm/ADT/StringRef.h
> > > > include/llvm/IR/Metadata.h
> > > > <D6241.16125.patch>
> > >
> > > This LGTM. I have minor nitpicks below that you can take or leave.
> > >
> > > @David, does this API make sense to you given Keno's use case [1]?
> > >
> > > While it matches that use case quite well (needing iterator begin/end)
> there are many other cases where we play the same game - getting uint8_t
> rather than just unsigned char:
> > >
> > > lib/Support/DataExtractor.cpp: return getU<uint8_t>(offset_ptr, this,
> IsLittleEndian, Data.data());
> > > lib/Support/MD5.cpp: const uint8_t *Ptr = Data.data();
> > > lib/Support/MD5.cpp: ArrayRef<uint8_t> SVal((const uint8_t
> *)Str.data(), Str.size());
> > > lib/IR/Constants.cpp: const uint8_t *Data = reinterpret_cast<const
> uint8_t *>(Str.data());
> > > lib/ProfileData/CoverageMappingReader.cpp: Result =
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > lib/ProfileData/CoverageMappingReader.cpp:
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > lib/ProfileData/CoverageMappingReader.cpp:
> decodeULEB128(reinterpret_cast<const uint8_t *>(Data.data()), &N);
> > > lib/ExecutionEngine/RuntimeDyld/RuntimeDyldChecker.cpp:
> reinterpret_cast<const uint8_t *>(SectionMem.data()),
> > > lib/ExecutionEngine/Interpreter/ExternalFunctions.cpp: uint8_t
> *ArgDataPtr = ArgData.data();
> > > lib/DebugInfo/DWARFFormValue.cpp: Value.data =
> reinterpret_cast<const uint8_t *>(str.data());
> > > include/llvm/Support/FileOutputBuffer.h: return
> (uint8_t*)Region->data();
> > > include/llvm/Support/FileOutputBuffer.h: return
> (uint8_t*)Region->data() + Region->size();
> > > include/llvm/Object/ELF.h: return reinterpret_cast<const uint8_t
> *>(Buf.data());
> > > include/llvm/MC/YAML.h: : Data(reinterpret_cast<const uint8_t
> *>(Data.data()), Data.size()),
> > > tools/llvm-objdump/llvm-objdump.cpp: ArrayRef<uint8_t>
> Bytes(reinterpret_cast<const uint8_t *>(BytesStr.data()),
> > > tools/llvm-objdump/MachODump.cpp: ArrayRef<uint8_t>
> Bytes(reinterpret_cast<const uint8_t *>(BytesStr.data()),
> > > tools/llvm-mc/Disassembler.cpp: ArrayRef<uint8_t>
> Data(Bytes.first.data(), Bytes.first.size());
> > > tools/llvm-readobj/StreamWriter.h: auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > tools/llvm-readobj/StreamWriter.h: auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > tools/llvm-readobj/StreamWriter.h: auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > > tools/llvm-readobj/StreamWriter.h: auto V =
> makeArrayRef(reinterpret_cast<const uint8_t*>(Value.data()),
> > >
> > > In several of these cases we're interested in getting an
> ArrayRef<uint8_t> from the StringRef - so maybe that's a better primitive
> operation (it's annoying when interoperating with containers, I'll grant
> you - since you need to call it twice, or stash a local variable - but one
> day... we'll have range algorithms for this sort of thing - in the mean
> time we could have utility functions like those in this patch, but I think
> the primary operation should probably be "give me a raw (uint8_t) byte
> array for this StringRef")
> > >
> >
> > Sure. Do you have a name in mind? `uref()`?
> >
> >
> > If we're going with a C++-esque name: "ArrayRef<uint8_t> bytes()" ?
> >
>
> `bytes()` sounds like `char` to me. `ubytes()`?
>
Really? It seems like most places that want bytes deal with unsigned chars,
and the existence of this method on a StringRef would imply that it's not
just giving identical begin/end as the StringRef's own begin/end, I would
imagine...
But if you particularly prefer ubytes, sure.
> Then if someone needs a `signed char` they can add `sbytes()`.
>
sbytes is just a range over char which StringRef already is, I'm not sure
we'd ever need this function (I think it'd be reasonable to have some
conversion from StringRef to ArrayRef<char> if we ever need it - though I
suspect we won't).
- David
>
> >
> > > Also see Aaron's recent -Wcast-qual fixing patches that touched
> several cases like this:
> > > http://llvm.org/viewvc/llvm-project?view=revision&revision=221782
> > > http://llvm.org/viewvc/llvm-project?view=revision&revision=221781
> > >
> > >
> > > [1]:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141103/243658.html
> > >
> > > > Index: include/llvm/ADT/StringRef.h
> > > > ===================================================================
> > > > --- include/llvm/ADT/StringRef.h
> > > > +++ include/llvm/ADT/StringRef.h
> > > > @@ -40,6 +40,7 @@
> > > > class StringRef {
> > > > public:
> > > > typedef const char *iterator;
> > > > + typedef unsigned const char *uiterator;
> > >
> > > This spelling is strange to me; I'd expect:
> > >
> > > typedef const unsigned char *uiterator;
> > >
> > > > typedef const char *const_iterator;
> > > > static const size_t npos = ~size_t(0);
> > > > typedef size_t size_type;
> > > > @@ -88,8 +89,12 @@
> > > > /// @{
> > > >
> > > > iterator begin() const { return Data; }
> > > > + uiterator ubegin() const { return
> reinterpret_cast<uiterator>(Data); }
> > >
> > > I have a slight preference for:
> > >
> > > uiterator ubegin() const { return
> reinterpret_cast<uiterator>(begin()); }
> > >
> > > But mainly just to match my change to `uend()` below.
> > >
> > > >
> > > > iterator end() const { return Data + Length; }
> > > > + uiterator uend() const {
> > > > + return reinterpret_cast<uiterator>(Data) + Length;
> > > > + }
> > >
> > > I think this should be:
> > >
> > > uiterator uend() const { return
> reinterpret_cast<uiterator>(end()); }
> > >
> > > >
> > > > /// @}
> > > > /// @name String Operations
> > > > Index: include/llvm/IR/Metadata.h
> > > > ===================================================================
> > > > --- include/llvm/IR/Metadata.h
> > > > +++ include/llvm/IR/Metadata.h
> > > > @@ -55,12 +55,15 @@
> > > > unsigned getLength() const { return (unsigned)getName().size(); }
> > > >
> > > > typedef StringRef::iterator iterator;
> > > > + typedef StringRef::uiterator uiterator;
> > > >
> > > > /// \brief Pointer to the first byte of the string.
> > > > iterator begin() const { return getName().begin(); }
> > > > + uiterator ubegin() const { return getName().ubegin(); }
> > > >
> > > > /// \brief Pointer to one byte past the end of the string.
> > > > iterator end() const { return getName().end(); }
> > > > + uiterator uend() const { return getName().uend(); }
> > > >
> > > > /// \brief Methods for support type inquiry through isa, cast,
> and dyn_cast.
> > > > static bool classof(const Value *V) {
> > > >
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141113/7e530666/attachment.html>
More information about the llvm-commits
mailing list