[LLVMdev] llvm-ranlib: Bus Error in regressions + fix

Evan Jones ejones at uwaterloo.ca
Wed Nov 23 05:16:59 PST 2005


On Nov 22, 2005, at 23:59, Reid Spencer wrote:
>> = {0,
>>       0, 4, 0}}}
>> (gdb) p archPath
>> $3 = {path = {static npos = 4294967295,
>>     _M_dataplus = {<allocator<char>> = {<No data fields>},
>>       _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage =
> What's with the "5\b" at the end? Looks like garbage to me. Not sure 
> what's up with that.

The implementation of std::string on this system does not always NULL 
terminate strings. I verified that the path.length() value in that case 
was 10, and calling path.c_str() returned the correct NULL terminated 
string.


I found a system where llvm-ranlib "works," and I have figured out the 
problem. Here is the crashed strace, after it opens the file:

open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15
fstat64(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x4001a000
_llseek(15, 0, [0], SEEK_CUR)           = 0
--- SIGBUS (Bus error) @ 0 (0) ---


and here is the "working" strace, after it opens the file:

open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 17
write(17, "!<arch>\n", 8)               = 8
_llseek(17, 0, [8], SEEK_CUR)           = 0
write(17, "/               1100833490  0   "..., 106) = 106
_llseek(17, 0, [114], SEEK_CUR)         = 0


Note that in the "working" case, the C++ library decides to flush the 
magic number to the file immediately. This causes the first page in the 
MMAPed file to be allocated (with zeros). Hence, when we go to read the 
foreignST pointer, it doesn't crash, but it has silently corrupted the 
data. The foreignST pointer looks like this when it is first read from 
the file:

(gdb) x/46bx foreignST->getData()
0xf6ffe044:     0x00    0x00    0x00    0x02    0x00    0x00    0x07    
0x4e
0xf6ffe04c:     0x00    0x00    0x07    0x4e    0x5f    0x5a    0x4e    
0x34
0xf6ffe054:     0x6c    0x6c    0x76    0x6d    0x35    0x49    0x73    
0x4e
0xf6ffe05c:     0x41    0x4e    0x45    0x66    0x00    0x5f    0x5a    
0x4e
0xf6ffe064:     0x34    0x6c    0x6c    0x76    0x6d    0x35    0x49    
0x73
0xf6ffe06c:     0x4e    0x41    0x4e    0x45    0x64    0x00


And it looks like this just before it gets written out:

(gdb) x/46bx foreignST->getData()
0xf6ffe044:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    
0x00
0xf6ffe04c:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    
0x00
0xf6ffe054:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    
0x00
0xf6ffe05c:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    
0x00
0xf6ffe064:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    
0x00
0xf6ffe06c:     0x00    0x00    0x00    0x00    0x00    0x00


You can tell that it is getting corrupted by looking at the file with 
"hexdump -C" before and after llvm-ranlib. The native symbol table is 
supposed to start at offset 0x44.


>> Is this data supposed to be copied out of the original file,
> That's one solution but it defeats the purpose/efficiency of the mmap.

Well, only the native symbol table would need to be copied, and only in 
the writeToDisk function, before the file gets invalidated.


>> another temporary supposed to be created and then the original could 
>> be replaced using a file move operation instead?
> Another temporary file would be even slower than copying the memory in 
> memory.

I'm not so sure about that. The archive file is being truncated anyway. 
How much worse would it be to unlink the file before writing it out? 
The operating system probably already deallocates the disk blocks as 
soon as it is truncated. That also fixes the problem, since the 
original mapped file still exists until it is unmapped.


> There's two solutions:
> (1) copy the data pointed to by foreignST
> (2) build the new file with the symbol table in a 3rd temporary file 
> which is later renamed as the original (temp.GNU.a in this case).

(3) unlink the original file first (basically equivalent to (2))
(4) Write the foreignST into the TmpArchive file. Is there any reason 
that this isn't possible? Then the final archive would be created in a 
single pass, and it could just be moved into place.

Also, since writeToDisk() invalidates the mapped file that was 
originally used to create the archive, shouldn't this function also 
unmap that file and erase all its members? This would prevent and 
further bugs like this from happening.

Evan Jones

--
Evan Jones
http://evanjones.ca/




More information about the llvm-dev mailing list