[lldb-dev] [PATCH] Don't calculate whole file crc for ELF core file.

Piotr Rak piotr.rak at gmail.com
Sun Mar 23 11:28:11 PDT 2014


Hi,

Working on something else I've noticed that ProcessElfCore plugin can take
considerable amount of time when once given big core file.

The file in question was 4.1 GiB core file generated by linux kernel, of
process that crashed.

I took the small detour to investigate and it turned out to be spending ~72s
calculating crc32 for whole file from:

ProcessElfCore::CanDebug
  ModuleList::GetShareModule

That in the result blocks ObjectFileELF::GetModuleSpecification.

That time was measured with Debug+Assert build variant, on puny i5 laptop
with traditional HDD, thus in reality it would be bit better, but still
meaningful.

If I understand correctly the main purpose of UUID's for ObjectFileELF is
aiding gnu_debuglink which is used to locate debug info. The other one is
related to ModuleSpec and ModuleList where it is used for identification.

First in case of modules doesn't make much sense, it won't be correct, or
have even possibility to work, while second turns out to be useful, anyway.

Simple workaround or just inventing UUID for ModuleSpec before passing it
to ModuleList::GetShareModule would probably work too for this particular
case. ELF core files have no universal information reassembling
gnu.debuglink or .note.gnu.build-id, thus we will be falling back to quite
costly crc computation that is not strictly required.

I have not noticed also easy way to make this computation in background, so
I decided limit it to PT_NOTE segments, which still have quite bit of
unique info, like: combinations pids/tids/uids/register contexts, and
sometimes much more.

That brings down time to somewhat more appealing value:
0.397 ms

Please note that while it was bit optimistic case for that solution, since
it was single threaded process, and its PT_NOTE segment had only 4680
bytes.

For other core binary that had PT_NOTE segment of size ~48MiB it still
takes:
1.206 s

in same setup.
That was pretty artificial case of *fork bomb* with almost 30k (29357)
threads.

While using only note segment makes  hitting collision it bit more likely,
I think it is pretty remote, and not meaningful in practice.

Also if anyone has ideas about other/better solution to this problem I will
gladly look into it, as performance of core file handling is important for
my use-cases.

If this solution is fine for trunk, please commit, I don't have commit
access.
Both raw diff and git path of same change are attached.

Cheers,
/Piotr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140323/efa5a1c8/attachment.html>
-------------- next part --------------
diff --git a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 948e5e8..3941f95 100644
--- a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -22,6 +22,7 @@
 #include "lldb/Core/PluginManager.h"
 #include "lldb/Core/Section.h"
 #include "lldb/Core/Stream.h"
+#include "lldb/Core/Timer.h"
 #include "lldb/Symbol/DWARFCallFrameInfo.h"
 #include "lldb/Symbol/SymbolContext.h"
 #include "lldb/Target/SectionLoadList.h"
@@ -231,6 +232,10 @@ ELFNote::Parse(const DataExtractor &data, lldb::offset_t *offset)
     return true;
 }
 
+// Arbitrary constant used as UUID prefix for core files.
+const uint32_t
+ObjectFileELF::g_core_uuid_magic(0xE210C);
+
 //------------------------------------------------------------------
 // Static methods.
 //------------------------------------------------------------------
@@ -348,7 +353,7 @@ ObjectFileELF::MagicBytesMatch (DataBufferSP& data_sp,
  *   code or tables extracted from it, as desired without restriction.
  */
 static uint32_t
-calc_gnu_debuglink_crc32(const void *buf, size_t size)
+calc_crc32(uint32_t crc, const void *buf, size_t size)
 {
     static const uint32_t g_crc32_tab[] =
     {
@@ -397,14 +402,51 @@ calc_gnu_debuglink_crc32(const void *buf, size_t size)
         0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
     };    
     const uint8_t *p = (const uint8_t *)buf;
-    uint32_t crc;
 
-    crc = ~0U;
+    crc = crc ^ ~0U;
     while (size--)
         crc = g_crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
     return crc ^ ~0U;
 }
 
+static uint32_t
+calc_gnu_debuglink_crc32(const void *buf, size_t size)
+{
+    return calc_crc32(0U, buf, size);
+}
+
+uint32_t
+ObjectFileELF::CalculateELFNotesSegmentsCRC32 (const ProgramHeaderColl& program_headers,
+                                               DataExtractor& object_data)
+{
+    typedef ProgramHeaderCollConstIter Iter;
+
+    uint32_t core_notes_crc = 0;
+
+    for (Iter I = program_headers.begin(); I != program_headers.end(); ++I)
+    {
+        if (I->p_type == llvm::ELF::PT_NOTE)
+        {
+            const elf_off ph_offset = I->p_offset;
+            const size_t ph_size = I->p_filesz;
+
+            DataExtractor segment_data;
+            if (segment_data.SetData(object_data, ph_offset, ph_size) != ph_size)
+            {
+                // The ELF program header contained incorrect data,
+                // prabably corefile is incomplete or corrupted.
+                break;
+            }
+
+            core_notes_crc = calc_crc32(core_notes_crc,
+                                        segment_data.GetDataStart(),
+                                        segment_data.GetByteSize());
+        }
+    }
+
+    return core_notes_crc;
+}
+
 size_t
 ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                                         lldb::DataBufferSP& data_sp,
@@ -454,14 +496,54 @@ ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                     lldb_private::UUID &uuid = spec.GetUUID();
                     GetSectionHeaderInfo(section_headers, data, header, uuid, gnu_debuglink_file, gnu_debuglink_crc);
 
+
                     if (!uuid.IsValid())
                     {
+                        uint32_t core_notes_crc = 0;
+
                         if (!gnu_debuglink_crc)
                         {
-                            // Need to map entire file into memory to calculate the crc.
-                            data_sp = file.MemoryMapFileContents (file_offset, SIZE_MAX);
-                            data.SetData(data_sp);
-                            gnu_debuglink_crc = calc_gnu_debuglink_crc32 (data.GetDataStart(), data.GetByteSize());
+                            lldb_private::Timer scoped_timer (__PRETTY_FUNCTION__,
+                                                              "Calculating module crc32 %s with size %" PRIu64 " KiB",
+                                                              file.GetLastPathComponent().AsCString(),
+                                                              (file.GetByteSize()-file_offset)/1024);
+
+                            // For core files - which usually don't happen to have a gnu_debuglink,
+                            // and are pretty bulky - calulating whole contents crc32 would be too much of luxury.
+                            // Thus we will need to fallback to something simpler.
+                            if (header.e_type == llvm::ELF::ET_CORE)
+                            {
+                                size_t program_headers_end = header.e_phoff + header.e_phnum * header.e_phentsize;
+                                if (program_headers_end > data_sp->GetByteSize())
+                                {
+                                    data_sp = file.MemoryMapFileContents(file_offset, program_headers_end);
+                                    data.SetData(data_sp);
+                                }
+                                ProgramHeaderColl program_headers;
+                                GetProgramHeaderInfo(program_headers, data, header);
+
+                                size_t segment_data_end = 0;
+                                for (ProgramHeaderCollConstIter I = program_headers.begin();
+                                     I != program_headers.end(); ++I)
+                                {
+                                     segment_data_end = std::max (I->p_offset + I->p_filesz, segment_data_end);
+                                }
+
+                                if (segment_data_end > data_sp->GetByteSize())
+                                {
+                                    data_sp = file.MemoryMapFileContents(file_offset, segment_data_end);
+                                    data.SetData(data_sp);
+                                }
+
+                                core_notes_crc = CalculateELFNotesSegmentsCRC32 (program_headers, data);
+                            }
+                            else
+                            {
+                                // Need to map entire file into memory to calculate the crc.
+                                data_sp = file.MemoryMapFileContents (file_offset, SIZE_MAX);
+                                data.SetData(data_sp);
+                                gnu_debuglink_crc = calc_gnu_debuglink_crc32 (data.GetDataStart(), data.GetByteSize());
+                            }
                         }
                         if (gnu_debuglink_crc)
                         {
@@ -469,6 +551,13 @@ ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                             uint32_t uuidt[4] = { gnu_debuglink_crc, 0, 0, 0 };
                             uuid.SetBytes (uuidt, sizeof(uuidt));
                         }
+                        else if (core_notes_crc)
+                        {
+                            // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it look different form
+                            // .gnu_debuglink crc followed by 4 bytes of note segments crc.
+                            uint32_t uuidt[4] = { g_core_uuid_magic, core_notes_crc, 0, 0 };
+                            uuid.SetBytes (uuidt, sizeof(uuidt));
+                        }
                     }
 
                     specs.Append(spec);
@@ -620,7 +709,7 @@ bool
 ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
 {
     // Need to parse the section list to get the UUIDs, so make sure that's been done.
-    if (!ParseSectionHeaders())
+    if (!ParseSectionHeaders() && GetType() != ObjectFile::eTypeCoreFile)
         return false;
 
     if (m_uuid.IsValid())
@@ -629,7 +718,28 @@ ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
         *uuid = m_uuid;
         return true;
     }
-    else 
+    else if (GetType() == ObjectFile::eTypeCoreFile)
+    {
+        uint32_t core_notes_crc = 0;
+
+        if (!ParseProgramHeaders())
+            return false;
+
+        core_notes_crc = CalculateELFNotesSegmentsCRC32(m_program_headers, m_data);
+
+        if (core_notes_crc)
+        {
+            // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it
+            // look different form .gnu_debuglink crc - followed by 4 bytes of note
+            // segments crc.
+            uint32_t uuidt[4] = { g_core_uuid_magic, core_notes_crc, 0, 0 };
+            // FIXME: Should we also update m_uuid at this point, so this
+            // calculation (possibly costly) is not repeated?
+            uuid->SetBytes (uuidt, sizeof(uuidt));
+            return true;
+        }
+    }
+    else
     {
         if (!m_gnu_debuglink_crc)
             m_gnu_debuglink_crc = calc_gnu_debuglink_crc32 (m_data.GetDataStart(), m_data.GetByteSize());
@@ -637,6 +747,8 @@ ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
         {
             // Use 4 bytes of crc from the .gnu_debuglink section.
             uint32_t uuidt[4] = { m_gnu_debuglink_crc, 0, 0, 0 };
+            // FIXME: Should we also update m_uuid at this point, so this
+            // calculation (possibly costly) is not repeated.
             uuid->SetBytes (uuidt, sizeof(uuidt));
             return true;
         }
@@ -801,41 +913,53 @@ ObjectFileELF::ParseDependentModules()
 }
 
 //----------------------------------------------------------------------
-// ParseProgramHeaders
+// GetProgramHeaderInfo
 //----------------------------------------------------------------------
 size_t
-ObjectFileELF::ParseProgramHeaders()
+ObjectFileELF::GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
+                                    DataExtractor &object_data,
+                                    const ELFHeader &header)
 {
     // We have already parsed the program headers
-    if (!m_program_headers.empty())
-        return m_program_headers.size();
+    if (!program_headers.empty())
+        return program_headers.size();
 
     // If there are no program headers to read we are done.
-    if (m_header.e_phnum == 0)
+    if (header.e_phnum == 0)
         return 0;
 
-    m_program_headers.resize(m_header.e_phnum);
-    if (m_program_headers.size() != m_header.e_phnum)
+    program_headers.resize(header.e_phnum);
+    if (program_headers.size() != header.e_phnum)
         return 0;
 
-    const size_t ph_size = m_header.e_phnum * m_header.e_phentsize;
-    const elf_off ph_offset = m_header.e_phoff;
+    const size_t ph_size = header.e_phnum * header.e_phentsize;
+    const elf_off ph_offset = header.e_phoff;
     DataExtractor data;
-    if (GetData (ph_offset, ph_size, data) != ph_size)
+    if (data.SetData(object_data, ph_offset, ph_size) != ph_size)
         return 0;
 
     uint32_t idx;
     lldb::offset_t offset;
-    for (idx = 0, offset = 0; idx < m_header.e_phnum; ++idx)
+    for (idx = 0, offset = 0; idx < header.e_phnum; ++idx)
     {
-        if (m_program_headers[idx].Parse(data, &offset) == false)
+        if (program_headers[idx].Parse(data, &offset) == false)
             break;
     }
 
-    if (idx < m_program_headers.size())
-        m_program_headers.resize(idx);
+    if (idx < program_headers.size())
+        program_headers.resize(idx);
+
+    return program_headers.size();
+
+}
 
-    return m_program_headers.size();
+//----------------------------------------------------------------------
+// ParseProgramHeaders
+//----------------------------------------------------------------------
+size_t
+ObjectFileELF::ParseProgramHeaders()
+{
+    return GetProgramHeaderInfo(m_program_headers, m_data, m_header);
 }
 
 static bool
diff --git a/source/Plugins/ObjectFile/ELF/ObjectFileELF.h b/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
index 44b40a4..ac7d38c 100644
--- a/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
+++ b/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
@@ -214,6 +214,7 @@ private:
 
     /// Version of this reader common to all plugins based on this class.
     static const uint32_t m_plugin_version = 1;
+    static const uint32_t g_core_uuid_magic;
 
     /// ELF file header.
     elf::ELFHeader m_header;
@@ -249,6 +250,17 @@ private:
     size_t
     SectionIndex(const SectionHeaderCollConstIter &I) const;
 
+    // Parses the ELF program headers.
+    static size_t
+    GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
+                         lldb_private::DataExtractor &data,
+                         const elf::ELFHeader &header);
+
+    // Finds PT_NOTE segments and calculates their crc sum.
+    static uint32_t
+    CalculateELFNotesSegmentsCRC32(const ProgramHeaderColl& program_headers,
+                                   lldb_private::DataExtractor &data);
+
     /// Parses all section headers present in this object file and populates
     /// m_program_headers.  This method will compute the header list only once.
     /// Returns the number of headers parsed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Don-t-calculate-whole-file-crc-for-ELF-core-file.patch
Type: text/x-patch
Size: 14038 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140323/efa5a1c8/attachment.bin>


More information about the lldb-dev mailing list