[PATCH] Make sure BitcodeWriter works with Unicode characters

Duncan P. N. Exon Smith dexonsmith at apple.com
Fri Dec 12 16:19:23 PST 2014


You might look at what actually gets stored in bitcode with `bc-analyzer`
(is there a way to inspect data as well as structure?).

I just had a thought that this kind of bug will only cause a crash if you're
using an abbreviation.  I think a lot of strings are printed with an
abbreviation *only if* they fit the char6 encoding.

So perhaps (outside of `MDString`), the bug doesn't cause a crash, but
causes us to serialize 64-bits of data instead of just the 8-bit uint8_t.

> On 2014 Dec 12, at 15:57, Keno Fischer <kfischer at college.harvard.edu> wrote:
> 
> Ok, thanks for taking a look. I'll have a look again and see if I can expose something.
> 
> On Fri, Dec 12, 2014 at 6:56 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> Oversight.  TBH, I didn't look at your patch again; I just saw the PR,
> remembered you'd fixed the problem before, and looked up the thread in
> order to credit you (and to find the right paint colour).
> 
> Just now, I tried to expose a few other problems with testcases, and they
> all seem to be invalid (`llvm-as` doesn't accept strange characters in
> metadata names) or already work (names of globals, structs, and locals).
> 
> Go ahead and commit (or file a PR) for anything you can make a testcase
> for though!
> 
> > On 2014 Dec 12, at 15:08, Keno Fischer <kfischer at college.harvard.edu> wrote:
> >
> > Hi Duncan,
> > looking at your commit, were the other three instances that were in my patch of this pattern not affected, or was that an oversight?
> >
> > Thanks,
> > Keno
> >
> > On Thu, Dec 11, 2014 at 6:53 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> > No problem!
> >
> > > On 2014 Dec 11, at 15:52, Keno Fischer <kfischer at college.harvard.edu> wrote:
> > >
> > > Sorry about that. I haven't found the time to get back to this (super busy this last month). I was planning to look at this again next week, but thanks for committing it :)!
> > >
> > > On Thu, Dec 11, 2014 at 6:38 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> > >
> > > > On 2014 Nov 9, at 13:17, Keno Fischer <kfischer at college.harvard.edu> wrote:
> > > >
> > > > Previously when a metadata string contained unicode characters,
> > > > it would be incorrectly placed in the Record array because chars
> > > > are signed by default and hence characters with the high bit set
> > > > would get sign extended, but the bitcode writer was attempting
> > > > to write the lowest 8 bit of the now sign-extended value. This
> > > > caused an assertion failure later on. The fix is just to cast
> > > > the pointer to uint8_t* first to prevent sign extension.
> > > > This came up in the context for metadata strings, but I did a
> > > > quick pass and changed the other instances of this pattern in
> > > > the file as well.
> > > >
> > > > http://reviews.llvm.org/D6184
> > > >
> > > > Files:
> > > >  lib/Bitcode/Writer/BitcodeWriter.cpp
> > > >  test/Bitcode/unicode.ll
> > > > <D6184.15959.patch>_______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >
> > > This review seemed to get stalled.  I just committed a fix based on Keno's
> > > patch in r224077 (which I think matches what David and my bike-shedding
> > > landed on), fixing PR21882.
> > >
> >
> >
> 
> 





More information about the llvm-commits mailing list