[llvm-dev] question about xray tls data initialization

Dean Michael Berris via llvm-dev llvm-dev at lists.llvm.org
Tue Nov 21 18:37:37 PST 2017


> On 22 Nov 2017, at 02:32, comic fans <comicfans44 at gmail.com> wrote:
> 
> with some dirty hack , I've made xray runtime  'built' on windows ,

\o/

> but unfortunately I haven't  enough knowledge about linker and the
> runtime, and finally built executable didn't run.  I'd like to share
> my changes here , hopes somebody help me to make it run on windows.

Thanks for working on this!

If you're alright with it, maybe you can send some patches to review, preferably through the LLVM Phabricator instance? You can have me or Reid (who knows more about COFF and the Windows stuff) as reviewers.

> in AsmPrinter, copy/paster xray for coff target
> 
> InstMap = OutContext.getCOFFSection("xray_instr_map", 0,
> SectionKind::getReadOnlyWithRel());
> FnSledIndex = OutContext.getCOFFSection("xray_fn_idx",
> 0,SectionKind::getReadOnlyWithRel());
> 
> in XRayArgs , allow windows platform to use xray args. with this,
> generated code seems have sled and xray parts.
> 

Nice, I suspect we can make this change with tests as well, which we can build on incrementally.

> in xray runtime,
> bool atomic_compare_exchange_strong(volatile atomic_sint32_t *a,
>                                           s32 *cmp,
>                                           s32 xchg,
>                                           memory_order mo)
> is missed for MSVC , I take atomic_uint32_t implementation
> 

This is in compiler-rt/lib/sanitizer_common/... right?

> msvc 14.1 treats BufferQueue::Buffer::Buffer as constructor instead of
> data member, Buf.Buffer=>Buf.Data
> 

Interesting. That's an easy patch to merge. :)

> FunctionRecord pack , __attribute__((packed)) =>  #pragma
> pack(push,1),  msvc also requires bitfields to be same type to pack
> them together( all types => uint32_t)
> 

Are you able to test this on other platforms?

> FD  int => HANDLE, most code logic still valid (-1 as invalid value),
> r/w API replaced with windows
> 
> mprotect => VirtualProtect
> 
> readTSC in xray_x86_64.inc also works for windows
> 
> replace read tsc from proc with QueryPerformanceFrequency
> 
> msvc can not compile such code
> void setupNewBuffer(int (*wall_clock_reader)(clockid_t,
>                                                    struct timespec *));
> 
> must use typedef first . xray use clock_gettime as default
> implementation , which is not friendly for windows .create a fake one
> based on chrono system_clock(ignore clockid_t)
> 

This one is definitely something to do, even for potentially supporting XRay on Darwin where older versions of the SDK (10.11 and lower) don't define clock_gettime. Probably can be split off as a thing that can be reviewed and merged regardless.

> for tls destructor part, I've just commented them out.(but
> https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
> gives a thread exit callback way for coff)
> 

Interesting, thanks! This one is something that could be abstracted away on a per-platform basis.

> and last thing , which I don't understand is the weak symbol for
> __start_xray_instr_map[]
> __stop_xray_instr_map[]
> __start_xray_fn_idx[]
> __stop_xray_fn_idx[]
> 
> I replace them with  __declspec(selectany) , but I'm not sure they
> have same meanings.
> 

The __{start, stop}_xray_{instr_map,fn_idx}[] arrays are usually generated by the linker on ELF and ELF-like platforms. I'm not aware what the MSVC COFF linkers do, probably something others who know better can answer.

> 
> some random generated code:
>    .text
>    .intel_syntax noprefix
>    .def     call;
>    .scl    2;
>    .type    32;
>    .endef
>    .globl    call                    # -- Begin function call
>    .p2align    4, 0x90
> call:                                   # @call
> .seh_proc call
> # BB#0:                                 # %entry
>    .p2align    1, 0x90
> .Lxray_sled_0:
>    .ascii    "\353\t"
>    nop    word ptr [rax + rax + 512]
>    sub    rsp, 16
>    .seh_stackalloc 16
>    .seh_endprologue
>    mov    dword ptr [rsp + 12], ecx
>    mov    dword ptr [rsp + 8], 0
>    mov    dword ptr [rsp + 4], 0
> .LBB0_1:                                # %for.cond
>                                        # =>This Inner Loop Header: Depth=1
>    mov    eax, dword ptr [rsp + 4]
>    cmp    eax, dword ptr [rsp + 12]
>    jge    .LBB0_4
> # BB#2:                                 # %for.body
>                                        #   in Loop: Header=BB0_1 Depth=1
>    mov    eax, dword ptr [rsp + 4]
>    add    eax, dword ptr [rsp + 8]
>    mov    dword ptr [rsp + 8], eax
> # BB#3:                                 # %for.inc
>                                        #   in Loop: Header=BB0_1 Depth=1
>    mov    eax, dword ptr [rsp + 4]
>    add    eax, 1
>    mov    dword ptr [rsp + 4], eax
>    jmp    .LBB0_1
> .LBB0_4:                                # %for.end
>    mov    eax, dword ptr [rsp + 8]
>    add    rsp, 16
>    .p2align    1, 0x90
> .Lxray_sled_1:
>    ret
>    nop    word ptr cs:[rax + rax + 512]
>    .seh_handlerdata
>    .text
>    .seh_endproc
>                                        # -- End function
>    .section    xray_instr_map,"y"
> .Lxray_sleds_start0:
>    .quad    .Lxray_sled_0
>    .quad    call
>    .byte    0x00
>    .byte    0x00
>    .byte    0x00
>    .zero    13
>    .quad    .Lxray_sled_1
>    .quad    call
>    .byte    0x01
>    .byte    0x00
>    .byte    0x00
>    .zero    13
> .Lxray_sleds_end0:
>    .section    xray_fn_idx,"y"
>    .p2align    4, 0x90
>    .quad    .Lxray_sleds_start0
>    .quad    .Lxray_sleds_end0
>    .text
> 
> and parts of obj dump:
> 
> 
> SECTION HEADER #5
>     /16 name (xray_instr_map)
>       0 physical address
>       0 virtual address
>      40 size of raw data
>     198 file pointer to raw data (00000198 to 000001D7)
>     1D8 file pointer to relocation table
>       0 file pointer to line numbers
>       4 number of relocations
>       0 number of line numbers
>  100000 flags
>         1 byte align
> 
> RAW DATA #5
>  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  00000020: 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  V...............
>  00000030: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 
> 
> RELOCATIONS #5
>                                                Symbol    Symbol
> Offset    Type              Applied To         Index     Name
> --------  ----------------  -----------------  --------  ------
> 00000000  ADDR64            00000000 00000000         0  .text
> 00000008  ADDR64            00000000 00000000         E  call
> 00000020  ADDR64            00000000 00000056         0  .text
> 00000028  ADDR64            00000000 00000000         E  call
> 
> SECTION HEADER #6
>      /4 name (xray_fn_idx)
>       0 physical address
>       0 virtual address
>      10 size of raw data
>     200 file pointer to raw data (00000200 to 0000020F)
>     210 file pointer to relocation table
>       0 file pointer to line numbers
>       2 number of relocations
>       0 number of line numbers
>  500000 flags
>         16 byte align
> 
> RAW DATA #6
>  00000000: 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  ........ at .......
> 
> RELOCATIONS #6
>                                                Symbol    Symbol
> Offset    Type              Applied To         Index     Name
> --------  ----------------  -----------------  --------  ------
> 00000000  ADDR64            00000000 00000000         8  xray_instr_map
> 00000008  ADDR64            00000000 00000040         8  xray_instr_map
> 

This looks like it's actually worked, at least at CodeGen time.

Thanks again for sharing your experience, it'd be really great if you can have patches that we can review and land to potentially get XRay working on Windows!

Cheers

> On Tue, Nov 21, 2017 at 7:46 PM, Dean Michael Berris
> <dean.berris at gmail.com> wrote:
>> 
>> On 17 Nov 2017, at 00:44, comic fans via llvm-dev <llvm-dev at lists.llvm.org>
>> wrote:
>> 
>> I'm learning the xray library and try if it can be built on windows,  in
>> xray_fdr_logging_impl.h
>> 
>> line 152  , comment written as
>> // Using pthread_once(...) to initialize the thread-local data structures
>> 
>> 
>> but at line 175, 183, code written as
>> 
>> thread_local pthread_key_t key;
>> 
>> // Ensure that we only actually ever do the pthread initialization once.
>> thread_local bool UNUSED Unused = [] {
>>   new (&TLSBuffer) ThreadLocalData();
>>   auto result = pthread_key_create(&key, +[](void *) {
>>     auto &TLD = *reinterpret_cast<ThreadLocalData *>(&TLSBuffer);
>> 
>> 
>> I'm confused that pthread_key_t and Unused are both thread_local
>> variable, doesn't it mean the following lambda will run for each
>> thread , and create one pthread_key_t for only one tls data(instead of
>> only one pthread_key_t for all thread) ? also what does the '+' before
>> lambda expression mean ?  this may be stupid questions, could somebody
>> kindly  helped ?
>> 
>> 
>> Yeah, that comment is out-of-date (and the implementation is buggy) -- which
>> is a shame really. :/
>> 
>> But, the good news, is I think we've fixed this now in the top-of-trunk with
>> https://reviews.llvm.org/D39526 and https://reviews.llvm.org/D40164.
>> 
>> Curiously though, how far did your exploration into getting XRay to build on
>> Windows go?
>> 
>> Cheers
>> 
>> -- Dean
>> 

-- Dean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171122/a257a05e/attachment.html>


More information about the llvm-dev mailing list