[LLVMdev] Weird msan problem
Keno Fischer
kfischer at college.harvard.edu
Wed Feb 5 11:13:01 PST 2014
The @entry stuff is just a gdb artifact. I've been tracking this back a
little further, and it seems there's at least an origin propagation problem:
(gdb) list
281 goto value_to_pointer_error;
282 }
283 else {
284 if (jl_is_cpointer_type(jvt) && jl_tparam0(jvt) == jt) {
285 void *ptr = jl_unbox_voidpointer(v);
286 assert(__msan_test_shadow(&ptr,sizeof(void**)) == -1);
287 return (void*)ptr;
288 }
289 }
290
(gdb) p __msan_print_shadow(v,16)
00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff
o: 40000e23 o: 40000e23 o: 40000e23 o: 40000e23
$21 = void
(gdb) p __msan_print_shadow(&ptr,8)
ff ff ff ff ff ff ff ff
o: 40000e23 o: 80007614
$22 = void
Notice the origin of the lower bits is off. Any idea as to why?
On Mon, Feb 3, 2014 at 2:52 AM, Evgeniy Stepanov
<eugeni.stepanov at gmail.com>wrote:
> The code for ccall looks right. Sounds like you have a very small
> range of instructions where an uninitialized value appear. You could
> try debugging at asm level. Shadow for b should be passed at offset 0
> in __msan_param_tls.
>
> MSan could propagate shadow through arithmetic and even some logic
> operations (like select). It could be that b is clean on function
> entry, but then something uninitialized gets mixed in.
>
> Also, what is this @entry stuff? It looks like bitvector_any1 has 6
> arguments, but your ccall snippet only sets up tls for 3.
>
>
> On Sun, Feb 2, 2014 at 10:39 PM, Keno Fischer
> <kfischer at college.harvard.edu> wrote:
> > I don't think we're doing anything unusual. Here's the relevant snippet
> from
> > the generated IR for ccall (after instrumentation):
> >
> > ccall: ; preds = %pass6, %113
> > %114 = phi i8* [ %110, %113 ], [ %108, %pass6 ]
> > %115 = inttoptr i64 %17 to i64*
> > %116 = bitcast i8* %69 to i32*, !dbg !7
> > %117 = bitcast i8* %114 to i32 (i32*, i64, i64)*, !dbg !7
> > store i64 %_msret13, i64* getelementptr inbounds ([1000 x i64]*
> > @__msan_param_tls, i64 0, i64 0), align 8, !dbg !7
> > store i32 %70, i32* getelementptr inbounds ([1000 x i32]*
> > @__msan_param_origin_tls, i64 0, i64 0), align 4, !dbg !7
> > store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
> > @__msan_param_tls to i64), i64 8) to i64*), align 8, !dbg !7
> > store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
> > @__msan_param_origin_tls to i64), i64 8) to i32*), align 4, !dbg !7
> > store i64 0, i64* inttoptr (i64 add (i64 ptrtoint ([1000 x i64]*
> > @__msan_param_tls to i64), i64 16) to i64*), align 8, !dbg !7
> > store i32 0, i32* inttoptr (i64 add (i64 ptrtoint ([1000 x i32]*
> > @__msan_param_origin_tls to i64), i64 16) to i32*), align 4, !dbg !7
> > store i32 0, i32* bitcast ([8 x i64]* @__msan_retval_tls to i32*),
> align
> > 8, !dbg !7
> > %118 = call i32 %117(i32* %116, i64 %87, i64 %107), !dbg !7,
> !julia_type
> > !9
> >
> >
> >
> > On Sun, Feb 2, 2014 at 6:18 AM, Evgeniy Stepanov <
> eugeni.stepanov at gmail.com>
> > wrote:
> >>
> >> How is ccall() implemented? If it manually sets up a stack frame, then
> >> it also needs to store argument shadow values in paramtls.
> >>
> >> I don't think there is an overflow, unless you have a _lot_ of
> >> arguments in a function call.
> >>
> >> On Sun, Feb 2, 2014 at 9:26 AM, Keno Fischer
> >> <kfischer at college.harvard.edu> wrote:
> >> > Also, I was looking at the instrumented LLVM code and I noticed that
> the
> >> > LLVM global variable is generated with size 1000, while
> >> > kMsanParamTlsSizeInWords == 100, could there be some sort of overflow
> >> > happening?
> >> >
> >> >
> >> > On Sat, Feb 1, 2014 at 6:44 PM, Keno Fischer
> >> > <kfischer at college.harvard.edu>
> >> > wrote:
> >> >>
> >> >> I have verified that both TLS implementations indeed find the same
> area
> >> >> of
> >> >> memory. Anything else I could look for?
> >> >>
> >> >>
> >> >> On Tue, Jan 28, 2014 at 4:28 PM, Keno Fischer
> >> >> <kfischer at college.harvard.edu> wrote:
> >> >>>
> >> >>> Yes, both JIT code and the native runtime are instrumented. I am
> under
> >> >>> the impressions that the the C library should guarantee that from
> the
> >> >>> way
> >> >>> the relocations are implemented as long as both native and JITed
> code
> >> >>> are on
> >> >>> the same thread (but I will verify this and report back).
> >> >>>
> >> >>>
> >> >>> On Tue, Jan 28, 2014 at 2:41 AM, Evgeniy Stepanov
> >> >>> <eugeni.stepanov at gmail.com> wrote:
> >> >>>>
> >> >>>> I assume there are transitions between JITted code and native
> helper
> >> >>>> functions. How are you handling them? Are native functions
> >> >>>> MSan-instrumented?
> >> >>>> MSan is passing shadow across function calls in TLS slots. Does
> your
> >> >>>> TLS implementation guarantee that accesses to __msan_param_tls from
> >> >>>> JITted and from native code map to the same memory?
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Jan 27, 2014 at 11:36 PM, Evgeniy Stepanov
> >> >>>> <eugeni.stepanov at gmail.com> wrote:
> >> >>>> > This is really cool. I've not heard of anyone using MSan with
> MSJIT
> >> >>>> > before.
> >> >>>> >
> >> >>>> >
> >> >>>> > On Mon, Jan 27, 2014 at 7:44 PM, Keno Fischer
> >> >>>> > <kfischer at college.harvard.edu> wrote:
> >> >>>> >> Hello everybody,
> >> >>>> >>
> >> >>>> >> I've run into some strange behavior with memory sanitizer that I
> >> >>>> >> can't
> >> >>>> >> explain and hope somebody with more knowledge of the
> >> >>>> >> implementation
> >> >>>> >> would be
> >> >>>> >> able to help me out or at least point me into the right
> direction.
> >> >>>> >>
> >> >>>> >> For background, I'm using memory sanitizer to check Julia
> >> >>>> >> (julialang.org),
> >> >>>> >> which uses (or at least will once I track down a few bugs) MCJIT
> >> >>>> >> for
> >> >>>> >> the
> >> >>>> >> code compilation. So far I have rebuilt the runtime and all
> >> >>>> >> dependencies
> >> >>>> >> (including LLVM, libcxx, etc.) with memory sanitizer enabled and
> >> >>>> >> added the
> >> >>>> >> instrumentation pass in the appropriate place in the julia code
> >> >>>> >> generator.
> >> >>>> >>
> >> >>>> >> I'm now going through the usual bootstrap which basically loads
> >> >>>> >> the
> >> >>>> >> standard
> >> >>>> >> library and compiles it, does inference, etc. This works fine
> for
> >> >>>> >> several
> >> >>>> >> hours (this is usually much faster - by which I mean several
> >> >>>> >> hundred
> >> >>>> >> time -
> >> >>>> >> I suspect the issue is with MCJIT having to process a ton more
> >> >>>> >> relocations
> >> >>>> >> and code and being inefficient at it, but I can't prove that).
> >> >>>> >> That's
> >> >>>> >> not
> >> >>>> >> the issue however. Eventually, I get
> >> >>>> >>
> >> >>>> >> ==17150== WARNING: MemorySanitizer: use-of-uninitialized-value
> >> >>>> >> #0 0x7f417cea3189 in bitvector_any1
> >> >>>> >> /home/kfischer/julia-san/src/support/bitvector.c:177
> >> >>>> >> [ snip ]
> >> >>>> >>
> >> >>>> >> Uninitialized value was created by a heap allocation
> >> >>>> >> #0 0x7f41815de543 in __interceptor_malloc
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>
> /home/kfischer/julia-san/deps/llvm-svn/projects/compiler-rt/lib/msan/msan_interceptors.cc:854
> >> >>>> >> #1 0x7f417cc7d7f1 in alloc_big
> >> >>>> >> /home/kfischer/julia-san/src/gc.c:355
> >> >>>> >> [snip]
> >> >>>> >>
> >> >>>> >> Now, by going through it in the debugger, I see
> >> >>>> >>
> >> >>>> >> (gdb) f 3
> >> >>>> >> #3 0x00007f417cea318a in bitvector_any1 (b=0x60c000607240,
> >> >>>> >> b at entry=<optimized out>, offs=0, offs at entry=<optimized out>,
> >> >>>> >> nbits=256,
> >> >>>> >> nbits at entry=<optimized out>)
> >> >>>> >> at bitvector.c:177
> >> >>>> >> 177 if ((b[0] & mask) != 0) return 1;
> >> >>>> >> (gdb) p __msan_print_shadow(&b,8)
> >> >>>> >> ff ff ff ff ff ff ff ff
> >> >>>> >> o: 3f0010a6 o: 80007666
> >> >>>> >>
> >> >>>> >> which seems to indicate that the local variable b has
> >> >>>> >> uninitialized
> >> >>>> >> data.
> >> >>>> >> I'm having a hard time believing that though, since if I look at
> >> >>>> >> the
> >> >>>> >> functions before it, the place where it's coming from is
> >> >>>> >> initialized:
> >> >>>> >>
> >> >>>> >> #4 0x00007f41755208a8 in julia_isempty248 ()
> >> >>>> >> #5 0x00007f417c163e3d in jl_apply (f=0x606000984d60,
> >> >>>> >> f at entry=<optimized
> >> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>, nargs=1,
> >> >>>> >> nargs at entry=<optimized out>) at ./julia.h:1043
> >> >>>> >>
> >> >>>> >> (here's the code of that julia function for reference)
> >> >>>> >>
> >> >>>> >> isempty(s::IntSet) =
> >> >>>> >> !s.fill1s && ccall(:bitvector_any1, Uint32, (Ptr{Uint32},
> >> >>>> >> Uint64,
> >> >>>> >> Uint64), s.bits, 0, s.limit)==0
> >> >>>> >>
> >> >>>> >> Looking at where that value is coming from:
> >> >>>> >>
> >> >>>> >> (gdb) f 5
> >> >>>> >> #5 0x00007f417c163e3d in jl_apply (f=0x606000984d60,
> >> >>>> >> f at entry=<optimized
> >> >>>> >> out>, args=0x7fff9132da20, args at entry=<optimized out>, nargs=1,
> >> >>>> >> nargs at entry=<optimized out>) at ./julia.h:1043
> >> >>>> >> 1043 return f->fptr((jl_value_t*)f, args, nargs);
> >> >>>> >> (gdb) p ((jl_array_t*)((void**)args[0])[1])->data
> >> >>>> >> $43 = (void *) 0x60c000607240
> >> >>>> >> (gdb) p
> >> >>>> >> __msan_print_shadow(((jl_array_t*)((void**)args[0])[1]),0x30)
> >> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >> >>>> >> 00
> >> >>>> >> 00 00
> >> >>>> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >> >>>> >> 00
> >> >>>> >> o: d800496 o: d800496 o: d800496 o: d800496 o: d800496 o:
> >> >>>> >> d800496 o:
> >> >>>> >> d800496 o: d800496 o: d800496 o: d800496 o: d800496 o:
> >> >>>> >> d800496
> >> >>>> >>
> >> >>>> >> There are no uninitialized values to be seen anywhere and the
> `b`
> >> >>>> >> value
> >> >>>> >> isn't touched before that line, so I'm a little stumped.
> >> >>>> >>
> >> >>>> >> One note I should make is that I did have to implement TLS
> support
> >> >>>> >> myself in
> >> >>>> >> MCJIT for this to work (I'll upstream the patch soon), so I may
> >> >>>> >> have
> >> >>>> >> made a
> >> >>>> >> mistake, but I haven't found anything wrong yet. If nothing
> looks
> >> >>>> >> unusual,
> >> >>>> >> I'd also appreciate pointers on what to look for in the TLS
> >> >>>> >> variables.
> >> >>>> >>
> >> >>>> >> Thank you for your help,
> >> >>>> >> Keno
> >> >>>> >>
> >> >>>> >>
> >> >>>> >> _______________________________________________
> >> >>>> >> LLVM Developers mailing list
> >> >>>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> >>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>>> >>
> >> >>>
> >> >>>
> >> >>
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140205/bf63bfb6/attachment.html>
More information about the llvm-dev
mailing list