[PATCH] D38545: [MC] - llvm-mc hangs on non-english characters.

Rafael Avila de Espindola via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 4 11:25:16 PDT 2017


LGTM too

Cheers,
Rafael

George Rimar via Phabricator <reviews at reviews.llvm.org> writes:

> grimar created this revision.
>
> This fixes PR33255.
>
> Currently llvm-mc just hangs inside infinite loop while trying to parse file
> which has ".section .с" inside, where section name is non-english character.
>
> In this patch I also moved content of `non-english-characters.s` to test/MC/AsmParser/Inputs folder 
> so that `non-english-characters.s` becomes a single testcase for all invalid inputs
> containing non-english symbols. That is convinent because llvm-mc otherwise tries
> to parse and tokenize the whole testcase file with tools invocations and it is harder to isolate the issue.
>
>
> https://reviews.llvm.org/D38545
>
> Files:
>   lib/MC/MCParser/ELFAsmParser.cpp
>   test/MC/AsmParser/Inputs/non-english-characters-comments.s
>   test/MC/AsmParser/Inputs/non-english-characters-section-name.s
>   test/MC/AsmParser/non-english-characters.s
>
>
> Index: test/MC/AsmParser/non-english-characters.s
> ===================================================================
> --- test/MC/AsmParser/non-english-characters.s
> +++ test/MC/AsmParser/non-english-characters.s
> @@ -1,14 +1,9 @@
> -# RUN: llvm-mc -triple i386-linux-gnu -filetype=obj -o %t %s
> +# RUN: llvm-mc -triple i386-linux-gnu -filetype=obj -o %t \
> +# RUN:   %S/Inputs/non-english-characters-comments.s
>  # RUN: llvm-readobj %t | FileCheck %s
>  # CHECK: Format: ELF32-i386
>  
> -# 0bム
> -# 0xム
> -# .ム4
> -# .Xム
> -# .1ム
> -# .1eム
> -# 0x.ム
> -# 0x0pム
> -.intel_syntax
> -# 1ム
> +# RUN: not llvm-mc -triple i386-linux-gnu -filetype=obj -o %t \
> +# RUN:   %S/Inputs/non-english-characters-section-name.s 2>&1 | \
> +# RUN:     FileCheck %s --check-prefix=ERR
> +# ERR: invalid character in input
> Index: test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> ===================================================================
> --- test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> +++ test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> @@ -0,0 +1 @@
> +.section .ñ
> Index: test/MC/AsmParser/Inputs/non-english-characters-comments.s
> ===================================================================
> --- test/MC/AsmParser/Inputs/non-english-characters-comments.s
> +++ test/MC/AsmParser/Inputs/non-english-characters-comments.s
> @@ -0,0 +1,10 @@
> +# 0bム
> +# 0xム
> +# .ム4
> +# .Xム
> +# .1ム
> +# .1eム
> +# 0x.ム
> +# 0x0pム
> +.intel_syntax
> +# 1ム
> Index: lib/MC/MCParser/ELFAsmParser.cpp
> ===================================================================
> --- lib/MC/MCParser/ELFAsmParser.cpp
> +++ lib/MC/MCParser/ELFAsmParser.cpp
> @@ -247,7 +247,7 @@
>      return false;
>    }
>  
> -  while (true) {
> +  while (!getParser().hasPendingError()) {
>      SMLoc PrevLoc = getLexer().getLoc();
>      if (getLexer().is(AsmToken::Comma) ||
>        getLexer().is(AsmToken::EndOfStatement))
>
>
> Index: test/MC/AsmParser/non-english-characters.s
> ===================================================================
> --- test/MC/AsmParser/non-english-characters.s
> +++ test/MC/AsmParser/non-english-characters.s
> @@ -1,14 +1,9 @@
> -# RUN: llvm-mc -triple i386-linux-gnu -filetype=obj -o %t %s
> +# RUN: llvm-mc -triple i386-linux-gnu -filetype=obj -o %t \
> +# RUN:   %S/Inputs/non-english-characters-comments.s
>  # RUN: llvm-readobj %t | FileCheck %s
>  # CHECK: Format: ELF32-i386
>  
> -# 0bム
> -# 0xム
> -# .ム4
> -# .Xム
> -# .1ム
> -# .1eム
> -# 0x.ム
> -# 0x0pム
> -.intel_syntax
> -# 1ム
> +# RUN: not llvm-mc -triple i386-linux-gnu -filetype=obj -o %t \
> +# RUN:   %S/Inputs/non-english-characters-section-name.s 2>&1 | \
> +# RUN:     FileCheck %s --check-prefix=ERR
> +# ERR: invalid character in input
> Index: test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> ===================================================================
> --- test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> +++ test/MC/AsmParser/Inputs/non-english-characters-section-name.s
> @@ -0,0 +1 @@
> +.section .ñ
> Index: test/MC/AsmParser/Inputs/non-english-characters-comments.s
> ===================================================================
> --- test/MC/AsmParser/Inputs/non-english-characters-comments.s
> +++ test/MC/AsmParser/Inputs/non-english-characters-comments.s
> @@ -0,0 +1,10 @@
> +# 0bム
> +# 0xム
> +# .ム4
> +# .Xム
> +# .1ム
> +# .1eム
> +# 0x.ム
> +# 0x0pム
> +.intel_syntax
> +# 1ム
> Index: lib/MC/MCParser/ELFAsmParser.cpp
> ===================================================================
> --- lib/MC/MCParser/ELFAsmParser.cpp
> +++ lib/MC/MCParser/ELFAsmParser.cpp
> @@ -247,7 +247,7 @@
>      return false;
>    }
>  
> -  while (true) {
> +  while (!getParser().hasPendingError()) {
>      SMLoc PrevLoc = getLexer().getLoc();
>      if (getLexer().is(AsmToken::Comma) ||
>        getLexer().is(AsmToken::EndOfStatement))


More information about the llvm-commits mailing list