[PATCH] D38461: [MC] - Don't crash when non-english characters are used.

George Rimar via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 2 08:07:13 PDT 2017


grimar created this revision.

I found that llvm-mc does not like non-english characters even in comments,
which it tries to tokenize.

Problem happens because of functions like isdigit(), isalnum() which takes
int argument and expects it is not negative.
But at the same time MCParser uses char* to store input buffer poiner, char has signed value,
so it is possible to pass negative value to one of functions from above and
that triggers an assert. 
Testcase for demonstration is provided.

To fix the issue I cast value to unsigned. That seems to be consistent with
the rest LLVM code, which sometimes do the same.


https://reviews.llvm.org/D38461

Files:
  lib/MC/MCParser/AsmLexer.cpp
  test/MC/AsmParser/non-english-characters.s

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D38461.117350.patch
Type: text/x-patch
Size: 3951 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171002/516eb1f5/attachment.bin>


More information about the llvm-commits mailing list