[LLVMdev] Weird msan problem

Evgeniy Stepanov eugeni.stepanov at gmail.com
Sun Feb 2 23:52:09 PST 2014


The code for ccall looks right. Sounds like you have a very small
range of instructions where an uninitialized value appear. You could
try debugging at asm level. Shadow for b should be passed at offset 0
in __msan_param_tls.

MSan could propagate shadow through arithmetic and even some logic
operations (like select). It could be that b is clean on function
entry, but then something uninitialized gets mixed in.

Also, what is this @entry stuff? It looks like bitvector_any1 has 6
arguments, but your ccall snippet only sets up tls for 3.


On Sun, Feb 2, 2014 at 10:39 PM, Keno Fischer
<kfischer at college.harvard.edu> wrote:
> I don't think we're doing anything unusual. Here's the relevant snippet from
> the generated IR for ccall (after instrumentation):
>
> ccall:                                            ; preds = %pass6, %113
>   %114 = phi i8* [ %110, %113 ], [ %108, %pass6 ]
>   %115 = inttoptr i64 %17 to i64*
>   %116 = bitcast i8* %69 to i32*, !dbg !7
>   %117 = bitcast i8* %114 to i32 (i32*, i64, i64)*, !dbg !7
>   store i64 %_msret13, i64* getelementptr inbounds ([1000 x i64]*
> @__msan_param_tls, i64 0, i64 0), align 8, !dbg !7
>   store i32 %70, i32* getelementptr inbounds ([1000 x i32]*
> @__msan_param_origin_tls, i64 0, i64 0), align 4, !dbg !7
>   store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
> @__msan_param_tls to i64), i64 8) to i64*), align 8, !dbg !7
>   store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
> @__msan_param_origin_tls to i64), i64 8) to i32*), align 4, !dbg !7
>   store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
> @__msan_param_tls to i64), i64 16) to i64*), align 8, !dbg !7
>   store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
> @__msan_param_origin_tls to i64), i64 16) to i32*), align 4, !dbg !7
>   store i32 0, i32* bitcast ([8 x i64]* @__msan_retval_tls to i32*), align
> 8, !dbg !7
>   %118 = call i32 %117(i32* %116, i64 %87, i64 %107), !dbg !7, !julia_type
> !9
>
>
>
> On Sun, Feb 2, 2014 at 6:18 AM, Evgeniy Stepanov <eugeni.stepanov at gmail.com>
> wrote:
>>
>> How is ccall() implemented? If it manually sets up a stack frame, then
>> it also needs to store argument shadow values in paramtls.
>>
>> I don't think there is an overflow, unless you have a _lot_ of
>> arguments in a function call.
>>
>> On Sun, Feb 2, 2014 at 9:26 AM, Keno Fischer
>> <kfischer at college.harvard.edu> wrote:
>> > Also, I was looking at the instrumented LLVM code and I noticed that the
>> > LLVM global variable is generated with size 1000, while
>> > kMsanParamTlsSizeInWords == 100, could there be some sort of overflow
>> > happening?
>> >
>> >
>> > On Sat, Feb 1, 2014 at 6:44 PM, Keno Fischer
>> > <kfischer at college.harvard.edu>
>> > wrote:
>> >>
>> >> I have verified that both TLS implementations indeed find the same area
>> >> of
>> >> memory. Anything else I could look for?
>> >>
>> >>
>> >> On Tue, Jan 28, 2014 at 4:28 PM, Keno Fischer
>> >> <kfischer at college.harvard.edu> wrote:
>> >>>
>> >>> Yes, both JIT code and the native runtime are instrumented. I am under
>> >>> the impressions that the the C library should guarantee that from the
>> >>> way
>> >>> the relocations are implemented as long as both native and JITed code
>> >>> are on
>> >>> the same thread (but I will verify this and report back).
>> >>>
>> >>>
>> >>> On Tue, Jan 28, 2014 at 2:41 AM, Evgeniy Stepanov
>> >>> <eugeni.stepanov at gmail.com> wrote:
>> >>>>
>> >>>> I assume there are transitions between JITted code and native helper
>> >>>> functions. How are you handling them? Are native functions
>> >>>> MSan-instrumented?
>> >>>> MSan is passing shadow across function calls in TLS slots. Does your
>> >>>> TLS implementation guarantee that accesses to __msan_param_tls from
>> >>>> JITted and from native code map to the same memory?
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 27, 2014 at 11:36 PM, Evgeniy Stepanov
>> >>>> <eugeni.stepanov at gmail.com> wrote:
>> >>>> > This is really cool. I've not heard of anyone using MSan with MSJIT
>> >>>> > before.
>> >>>> >
>> >>>> >
>> >>>> > On Mon, Jan 27, 2014 at 7:44 PM, Keno Fischer
>> >>>> > <kfischer at college.harvard.edu> wrote:
>> >>>> >> Hello everybody,
>> >>>> >>
>> >>>> >> I've run into some strange behavior with memory sanitizer that I
>> >>>> >> can't
>> >>>> >> explain and hope somebody with more knowledge of the
>> >>>> >> implementation
>> >>>> >> would be
>> >>>> >> able to help me out or at least point me into the right direction.
>> >>>> >>
>> >>>> >> For background, I'm using memory sanitizer to check Julia
>> >>>> >> (julialang.org),
>> >>>> >> which uses (or at least will once I track down a few bugs) MCJIT
>> >>>> >> for
>> >>>> >> the
>> >>>> >> code compilation. So far I have rebuilt the runtime and all
>> >>>> >> dependencies
>> >>>> >> (including LLVM, libcxx, etc.) with memory sanitizer enabled and
>> >>>> >> added the
>> >>>> >> instrumentation pass in the appropriate place in the julia code
>> >>>> >> generator.
>> >>>> >>
>> >>>> >> I'm now going through the usual bootstrap which basically loads
>> >>>> >> the
>> >>>> >> standard
>> >>>> >> library and compiles it, does inference, etc. This works fine for
>> >>>> >> several
>> >>>> >> hours (this is usually much faster - by which I mean several
>> >>>> >> hundred
>> >>>> >> time -
>> >>>> >> I suspect the issue is with MCJIT having to process a ton more
>> >>>> >> relocations
>> >>>> >> and code and being inefficient at it, but I can't prove that).
>> >>>> >> That's
>> >>>> >> not
>> >>>> >> the issue however. Eventually, I get
>> >>>> >>
>> >>>> >> ==17150== WARNING: MemorySanitizer: use-of-uninitialized-value
>> >>>> >>     #0 0x7f417cea3189 in bitvector_any1
>> >>>> >> /home/kfischer/julia-san/src/support/bitvector.c:177
>> >>>> >> [ snip ]
>> >>>> >>
>> >>>> >>   Uninitialized value was created by a heap allocation
>> >>>> >>     #0 0x7f41815de543 in __interceptor_malloc
>> >>>> >>
>> >>>> >>
>> >>>> >> /home/kfischer/julia-san/deps/llvm-svn/projects/compiler-rt/lib/msan/msan_interceptors.cc:854
>> >>>> >>     #1 0x7f417cc7d7f1 in alloc_big
>> >>>> >> /home/kfischer/julia-san/src/gc.c:355
>> >>>> >> [snip]
>> >>>> >>
>> >>>> >> Now, by going through it in the debugger, I see
>> >>>> >>
>> >>>> >> (gdb) f 3
>> >>>> >> #3  0x00007f417cea318a in bitvector_any1 (b=0x60c000607240,
>> >>>> >> b at entry=<optimized out>, offs=0, offs at entry=<optimized out>,
>> >>>> >> nbits=256,
>> >>>> >> nbits at entry=<optimized out>)
>> >>>> >>     at bitvector.c:177
>> >>>> >> 177         if ((b[0] & mask) != 0) return 1;
>> >>>> >> (gdb) p __msan_print_shadow(&b,8)
>> >>>> >> ff ff ff ff ff ff ff ff
>> >>>> >>  o: 3f0010a6  o: 80007666
>> >>>> >>
>> >>>> >> which seems to indicate that the local variable b has
>> >>>> >> uninitialized
>> >>>> >> data.
>> >>>> >> I'm having a hard time believing that though, since if I look at
>> >>>> >> the
>> >>>> >> functions before it, the place where it's coming from is
>> >>>> >> initialized:
>> >>>> >>
>> >>>> >> #4  0x00007f41755208a8 in julia_isempty248 ()
>> >>>> >> #5  0x00007f417c163e3d in jl_apply (f=0x606000984d60,
>> >>>> >> f at entry=<optimized
>> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>, nargs=1,
>> >>>> >>     nargs at entry=<optimized out>) at ./julia.h:1043
>> >>>> >>
>> >>>> >> (here's the code of that julia function for reference)
>> >>>> >>
>> >>>> >> isempty(s::IntSet) =
>> >>>> >>     !s.fill1s && ccall(:bitvector_any1, Uint32, (Ptr{Uint32},
>> >>>> >> Uint64,
>> >>>> >> Uint64), s.bits, 0, s.limit)==0
>> >>>> >>
>> >>>> >> Looking at where that value is coming from:
>> >>>> >>
>> >>>> >> (gdb) f 5
>> >>>> >> #5  0x00007f417c163e3d in jl_apply (f=0x606000984d60,
>> >>>> >> f at entry=<optimized
>> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>, nargs=1,
>> >>>> >>     nargs at entry=<optimized out>) at ./julia.h:1043
>> >>>> >> 1043        return f->fptr((jl_value_t*)f, args, nargs);
>> >>>> >> (gdb) p ((jl_array_t*)((void**)args[0])[1])->data
>> >>>> >> $43 = (void *) 0x60c000607240
>> >>>> >> (gdb) p
>> >>>> >> __msan_print_shadow(((jl_array_t*)((void**)args[0])[1]),0x30)
>> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> >>>> >> 00
>> >>>> >> 00 00
>> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> >>>> >> 00
>> >>>> >>  o: d800496  o: d800496  o: d800496  o: d800496  o: d800496  o:
>> >>>> >> d800496  o:
>> >>>> >> d800496  o: d800496  o: d800496  o: d800496  o: d800496  o:
>> >>>> >> d800496
>> >>>> >>
>> >>>> >> There are no uninitialized values to be seen anywhere and the `b`
>> >>>> >> value
>> >>>> >> isn't touched before that line, so I'm a little stumped.
>> >>>> >>
>> >>>> >> One note I should make is that I did have to implement TLS support
>> >>>> >> myself in
>> >>>> >> MCJIT for this to work (I'll upstream the patch soon), so I may
>> >>>> >> have
>> >>>> >> made a
>> >>>> >> mistake, but I haven't found anything wrong yet. If nothing looks
>> >>>> >> unusual,
>> >>>> >> I'd also appreciate pointers on what to look for in the TLS
>> >>>> >> variables.
>> >>>> >>
>> >>>> >> Thank you for your help,
>> >>>> >> Keno
>> >>>> >>
>> >>>> >>
>> >>>> >> _______________________________________________
>> >>>> >> LLVM Developers mailing list
>> >>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>
>> >>>
>> >>>
>> >>
>> >
>
>



More information about the llvm-dev mailing list