[LLVMdev] Weird msan problem

Keno Fischer kfischer at college.harvard.edu
Wed Feb 5 12:21:25 PST 2014


Looks like when you materialize the stores, you should check the size of
the the store and emit an appropriate amount of stores to the origin shadow
(or just a memset intrinsic?).


On Wed, Feb 5, 2014 at 2:13 PM, Keno Fischer
<kfischer at college.harvard.edu>wrote:

> The @entry stuff is just a gdb artifact. I've been tracking this back a
> little further, and it seems there's at least an origin propagation problem:
>
> (gdb) list
> 281             goto value_to_pointer_error;
> 282         }
> 283         else {
> 284             if (jl_is_cpointer_type(jvt) && jl_tparam0(jvt) == jt) {
> 285                 void *ptr = jl_unbox_voidpointer(v);
> 286                 assert(__msan_test_shadow(&ptr,sizeof(void**)) == -1);
> 287                 return (void*)ptr;
> 288             }
> 289         }
> 290
> (gdb) p __msan_print_shadow(v,16)
> 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff
>  o: 40000e23  o: 40000e23  o: 40000e23  o: 40000e23
> $21 = void
> (gdb) p __msan_print_shadow(&ptr,8)
> ff ff ff ff ff ff ff ff
>  o: 40000e23  o: 80007614
> $22 = void
>
> Notice the origin of the lower bits is off. Any idea as to why?
>
>
> On Mon, Feb 3, 2014 at 2:52 AM, Evgeniy Stepanov <
> eugeni.stepanov at gmail.com> wrote:
>
>> The code for ccall looks right. Sounds like you have a very small
>> range of instructions where an uninitialized value appear. You could
>> try debugging at asm level. Shadow for b should be passed at offset 0
>> in __msan_param_tls.
>>
>> MSan could propagate shadow through arithmetic and even some logic
>> operations (like select). It could be that b is clean on function
>> entry, but then something uninitialized gets mixed in.
>>
>> Also, what is this @entry stuff? It looks like bitvector_any1 has 6
>> arguments, but your ccall snippet only sets up tls for 3.
>>
>>
>> On Sun, Feb 2, 2014 at 10:39 PM, Keno Fischer
>> <kfischer at college.harvard.edu> wrote:
>> > I don't think we're doing anything unusual. Here's the relevant snippet
>> from
>> > the generated IR for ccall (after instrumentation):
>> >
>> > ccall:                                            ; preds = %pass6, %113
>> >   %114 = phi i8* [ %110, %113 ], [ %108, %pass6 ]
>> >   %115 = inttoptr i64 %17 to i64*
>> >   %116 = bitcast i8* %69 to i32*, !dbg !7
>> >   %117 = bitcast i8* %114 to i32 (i32*, i64, i64)*, !dbg !7
>> >   store i64 %_msret13, i64* getelementptr inbounds ([1000 x i64]*
>> > @__msan_param_tls, i64 0, i64 0), align 8, !dbg !7
>> >   store i32 %70, i32* getelementptr inbounds ([1000 x i32]*
>> > @__msan_param_origin_tls, i64 0, i64 0), align 4, !dbg !7
>> >   store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
>> > @__msan_param_tls to i64), i64 8) to i64*), align 8, !dbg !7
>> >   store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
>> > @__msan_param_origin_tls to i64), i64 8) to i32*), align 4, !dbg !7
>> >   store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
>> > @__msan_param_tls to i64), i64 16) to i64*), align 8, !dbg !7
>> >   store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
>> > @__msan_param_origin_tls to i64), i64 16) to i32*), align 4, !dbg !7
>> >   store i32 0, i32* bitcast ([8 x i64]* @__msan_retval_tls to i32*),
>> align
>> > 8, !dbg !7
>> >   %118 = call i32 %117(i32* %116, i64 %87, i64 %107), !dbg !7,
>> !julia_type
>> > !9
>> >
>> >
>> >
>> > On Sun, Feb 2, 2014 at 6:18 AM, Evgeniy Stepanov <
>> eugeni.stepanov at gmail.com>
>> > wrote:
>> >>
>> >> How is ccall() implemented? If it manually sets up a stack frame, then
>> >> it also needs to store argument shadow values in paramtls.
>> >>
>> >> I don't think there is an overflow, unless you have a _lot_ of
>> >> arguments in a function call.
>> >>
>> >> On Sun, Feb 2, 2014 at 9:26 AM, Keno Fischer
>> >> <kfischer at college.harvard.edu> wrote:
>> >> > Also, I was looking at the instrumented LLVM code and I noticed that
>> the
>> >> > LLVM global variable is generated with size 1000, while
>> >> > kMsanParamTlsSizeInWords == 100, could there be some sort of overflow
>> >> > happening?
>> >> >
>> >> >
>> >> > On Sat, Feb 1, 2014 at 6:44 PM, Keno Fischer
>> >> > <kfischer at college.harvard.edu>
>> >> > wrote:
>> >> >>
>> >> >> I have verified that both TLS implementations indeed find the same
>> area
>> >> >> of
>> >> >> memory. Anything else I could look for?
>> >> >>
>> >> >>
>> >> >> On Tue, Jan 28, 2014 at 4:28 PM, Keno Fischer
>> >> >> <kfischer at college.harvard.edu> wrote:
>> >> >>>
>> >> >>> Yes, both JIT code and the native runtime are instrumented. I am
>> under
>> >> >>> the impressions that the the C library should guarantee that from
>> the
>> >> >>> way
>> >> >>> the relocations are implemented as long as both native and JITed
>> code
>> >> >>> are on
>> >> >>> the same thread (but I will verify this and report back).
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Jan 28, 2014 at 2:41 AM, Evgeniy Stepanov
>> >> >>> <eugeni.stepanov at gmail.com> wrote:
>> >> >>>>
>> >> >>>> I assume there are transitions between JITted code and native
>> helper
>> >> >>>> functions. How are you handling them? Are native functions
>> >> >>>> MSan-instrumented?
>> >> >>>> MSan is passing shadow across function calls in TLS slots. Does
>> your
>> >> >>>> TLS implementation guarantee that accesses to __msan_param_tls
>> from
>> >> >>>> JITted and from native code map to the same memory?
>> >> >>>>
>> >> >>>>
>> >> >>>> On Mon, Jan 27, 2014 at 11:36 PM, Evgeniy Stepanov
>> >> >>>> <eugeni.stepanov at gmail.com> wrote:
>> >> >>>> > This is really cool. I've not heard of anyone using MSan with
>> MSJIT
>> >> >>>> > before.
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > On Mon, Jan 27, 2014 at 7:44 PM, Keno Fischer
>> >> >>>> > <kfischer at college.harvard.edu> wrote:
>> >> >>>> >> Hello everybody,
>> >> >>>> >>
>> >> >>>> >> I've run into some strange behavior with memory sanitizer that
>> I
>> >> >>>> >> can't
>> >> >>>> >> explain and hope somebody with more knowledge of the
>> >> >>>> >> implementation
>> >> >>>> >> would be
>> >> >>>> >> able to help me out or at least point me into the right
>> direction.
>> >> >>>> >>
>> >> >>>> >> For background, I'm using memory sanitizer to check Julia
>> >> >>>> >> (julialang.org),
>> >> >>>> >> which uses (or at least will once I track down a few bugs)
>> MCJIT
>> >> >>>> >> for
>> >> >>>> >> the
>> >> >>>> >> code compilation. So far I have rebuilt the runtime and all
>> >> >>>> >> dependencies
>> >> >>>> >> (including LLVM, libcxx, etc.) with memory sanitizer enabled
>> and
>> >> >>>> >> added the
>> >> >>>> >> instrumentation pass in the appropriate place in the julia code
>> >> >>>> >> generator.
>> >> >>>> >>
>> >> >>>> >> I'm now going through the usual bootstrap which basically loads
>> >> >>>> >> the
>> >> >>>> >> standard
>> >> >>>> >> library and compiles it, does inference, etc. This works fine
>> for
>> >> >>>> >> several
>> >> >>>> >> hours (this is usually much faster - by which I mean several
>> >> >>>> >> hundred
>> >> >>>> >> time -
>> >> >>>> >> I suspect the issue is with MCJIT having to process a ton more
>> >> >>>> >> relocations
>> >> >>>> >> and code and being inefficient at it, but I can't prove that).
>> >> >>>> >> That's
>> >> >>>> >> not
>> >> >>>> >> the issue however. Eventually, I get
>> >> >>>> >>
>> >> >>>> >> ==17150== WARNING: MemorySanitizer: use-of-uninitialized-value
>> >> >>>> >>     #0 0x7f417cea3189 in bitvector_any1
>> >> >>>> >> /home/kfischer/julia-san/src/support/bitvector.c:177
>> >> >>>> >> [ snip ]
>> >> >>>> >>
>> >> >>>> >>   Uninitialized value was created by a heap allocation
>> >> >>>> >>     #0 0x7f41815de543 in __interceptor_malloc
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >>
>> /home/kfischer/julia-san/deps/llvm-svn/projects/compiler-rt/lib/msan/msan_interceptors.cc:854
>> >> >>>> >>     #1 0x7f417cc7d7f1 in alloc_big
>> >> >>>> >> /home/kfischer/julia-san/src/gc.c:355
>> >> >>>> >> [snip]
>> >> >>>> >>
>> >> >>>> >> Now, by going through it in the debugger, I see
>> >> >>>> >>
>> >> >>>> >> (gdb) f 3
>> >> >>>> >> #3  0x00007f417cea318a in bitvector_any1 (b=0x60c000607240,
>> >> >>>> >> b at entry=<optimized out>, offs=0, offs at entry=<optimized out>,
>> >> >>>> >> nbits=256,
>> >> >>>> >> nbits at entry=<optimized out>)
>> >> >>>> >>     at bitvector.c:177
>> >> >>>> >> 177         if ((b[0] & mask) != 0) return 1;
>> >> >>>> >> (gdb) p __msan_print_shadow(&b,8)
>> >> >>>> >> ff ff ff ff ff ff ff ff
>> >> >>>> >>  o: 3f0010a6  o: 80007666
>> >> >>>> >>
>> >> >>>> >> which seems to indicate that the local variable b has
>> >> >>>> >> uninitialized
>> >> >>>> >> data.
>> >> >>>> >> I'm having a hard time believing that though, since if I look
>> at
>> >> >>>> >> the
>> >> >>>> >> functions before it, the place where it's coming from is
>> >> >>>> >> initialized:
>> >> >>>> >>
>> >> >>>> >> #4  0x00007f41755208a8 in julia_isempty248 ()
>> >> >>>> >> #5  0x00007f417c163e3d in jl_apply (f=0x606000984d60,
>> >> >>>> >> f at entry=<optimized
>> >> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>,
>> nargs=1,
>> >> >>>> >>     nargs at entry=<optimized out>) at ./julia.h:1043
>> >> >>>> >>
>> >> >>>> >> (here's the code of that julia function for reference)
>> >> >>>> >>
>> >> >>>> >> isempty(s::IntSet) =
>> >> >>>> >>     !s.fill1s && ccall(:bitvector_any1, Uint32, (Ptr{Uint32},
>> >> >>>> >> Uint64,
>> >> >>>> >> Uint64), s.bits, 0, s.limit)==0
>> >> >>>> >>
>> >> >>>> >> Looking at where that value is coming from:
>> >> >>>> >>
>> >> >>>> >> (gdb) f 5
>> >> >>>> >> #5  0x00007f417c163e3d in jl_apply (f=0x606000984d60,
>> >> >>>> >> f at entry=<optimized
>> >> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>,
>> nargs=1,
>> >> >>>> >>     nargs at entry=<optimized out>) at ./julia.h:1043
>> >> >>>> >> 1043        return f->fptr((jl_value_t*)f, args, nargs);
>> >> >>>> >> (gdb) p ((jl_array_t*)((void**)args[0])[1])->data
>> >> >>>> >> $43 = (void *) 0x60c000607240
>> >> >>>> >> (gdb) p
>> >> >>>> >> __msan_print_shadow(((jl_array_t*)((void**)args[0])[1]),0x30)
>> >> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00
>> >> >>>> >> 00
>> >> >>>> >> 00 00
>> >> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00
>> >> >>>> >> 00
>> >> >>>> >>  o: d800496  o: d800496  o: d800496  o: d800496  o: d800496  o:
>> >> >>>> >> d800496  o:
>> >> >>>> >> d800496  o: d800496  o: d800496  o: d800496  o: d800496  o:
>> >> >>>> >> d800496
>> >> >>>> >>
>> >> >>>> >> There are no uninitialized values to be seen anywhere and the
>> `b`
>> >> >>>> >> value
>> >> >>>> >> isn't touched before that line, so I'm a little stumped.
>> >> >>>> >>
>> >> >>>> >> One note I should make is that I did have to implement TLS
>> support
>> >> >>>> >> myself in
>> >> >>>> >> MCJIT for this to work (I'll upstream the patch soon), so I may
>> >> >>>> >> have
>> >> >>>> >> made a
>> >> >>>> >> mistake, but I haven't found anything wrong yet. If nothing
>> looks
>> >> >>>> >> unusual,
>> >> >>>> >> I'd also appreciate pointers on what to look for in the TLS
>> >> >>>> >> variables.
>> >> >>>> >>
>> >> >>>> >> Thank you for your help,
>> >> >>>> >> Keno
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >> _______________________________________________
>> >> >>>> >> LLVM Developers mailing list
>> >> >>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> >>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>> >>
>> >> >>>
>> >> >>>
>> >> >>
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140205/05fa395c/attachment.html>


More information about the llvm-dev mailing list