[llvm-commits] RFC: new tool: llvm-strings

Dmitri Gribenko gribozavr at gmail.com
Tue Nov 13 09:45:13 PST 2012


On Tue, Nov 13, 2012 at 7:37 PM, Marshall Clow <mclow.lists at gmail.com> wrote:
> On Nov 13, 2012, at 8:50 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>>>> +bool isPrintable( char c ) {
>>>> +  return isalnum(c) ||
>>>> +         ispunct(c) ||
>>>> +         (isspace(c) && (!iscntrl(c) || c == '\t')) ||
>>>> +         (isascii(c) && isprint(c));
>>>> +//  Easy to replace this with a table at some point
>>>> +  }
>>>>
>>>> isalpha() & friends are locale-dependent.
>>>
>>> Yes.
>>> Is that a problem?
>>> [ Seriously. I can see arguments either way. ]
>>
>> As a user, I would not expect toolchain output to depend on locale.
>
> Really? I just checked FreeBSD's strings, as well as the GNU one.
> Both use isprint(), etc to check to see if something is printable.

I definitely didn't expect that :)

OK, I agree that it might be useful for locales that use 8-bit
encodings.  But it makes no sense for UTF-8 locales:

$ cat /tmp/zzz.c
const char *str1 = "йцукен";
const char *str2 = "qwerty";
$ gcc -c zzz.c
$ strings zzz.o
qwerty
$ strings --version
GNU strings (GNU Binutils for Debian) 2.22

So a UTF-8 mode would be useful regardless the decision we make about isprint().

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/




More information about the llvm-commits mailing list