[lldb-dev] [PATCH] Don't calculate whole file crc for ELF core file.

Piotr Rak piotr.rak at gmail.com
Sun Mar 23 12:07:56 PDT 2014


Hi,

Here is rebased version of same patch - so it applies cleanly on top of:

https://llvm.org/svn/llvm-project/lldb/trunk@204545

as it also touches ObjectFileELF.cpp and I somehow missed it yesterday...


2014-03-23 19:28 GMT+01:00 Piotr Rak <piotr.rak at gmail.com>:

> Hi,
>
> Working on something else I've noticed that ProcessElfCore plugin can take
> considerable amount of time when once given big core file.
>
> The file in question was 4.1 GiB core file generated by linux kernel, of
> process that crashed.
>
> I took the small detour to investigate and it turned out to be spending
> ~72s
> calculating crc32 for whole file from:
>
> ProcessElfCore::CanDebug
>   ModuleList::GetShareModule
>
> That in the result blocks ObjectFileELF::GetModuleSpecification.
>
> That time was measured with Debug+Assert build variant, on puny i5 laptop
> with traditional HDD, thus in reality it would be bit better, but still
> meaningful.
>
> If I understand correctly the main purpose of UUID's for ObjectFileELF is
> aiding gnu_debuglink which is used to locate debug info. The other one is
> related to ModuleSpec and ModuleList where it is used for identification.
>
> First in case of modules doesn't make much sense, it won't be correct, or
> have even possibility to work, while second turns out to be useful, anyway.
>
> Simple workaround or just inventing UUID for ModuleSpec before passing it
> to ModuleList::GetShareModule would probably work too for this particular
> case. ELF core files have no universal information reassembling
> gnu.debuglink or .note.gnu.build-id, thus we will be falling back to quite
> costly crc computation that is not strictly required.
>
> I have not noticed also easy way to make this computation in background,
> so I decided limit it to PT_NOTE segments, which still have quite bit of
> unique info, like: combinations pids/tids/uids/register contexts, and
> sometimes much more.
>
> That brings down time to somewhat more appealing value:
> 0.397 ms
>
> Please note that while it was bit optimistic case for that solution, since
> it was single threaded process, and its PT_NOTE segment had only 4680
> bytes.
>
> For other core binary that had PT_NOTE segment of size ~48MiB it still
> takes:
> 1.206 s
>
> in same setup.
> That was pretty artificial case of *fork bomb* with almost 30k (29357)
> threads.
>
> While using only note segment makes  hitting collision it bit more likely,
> I think it is pretty remote, and not meaningful in practice.
>
> Also if anyone has ideas about other/better solution to this problem I
> will gladly look into it, as performance of core file handling is important
> for my use-cases.
>
> If this solution is fine for trunk, please commit, I don't have commit
> access.
> Both raw diff and git path of same change are attached.
>
> Cheers,
> /Piotr
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140323/5c1364a0/attachment.html>
-------------- next part --------------
diff --git a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 11dce37..63b1d85 100644
--- a/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -22,6 +22,7 @@
 #include "lldb/Core/PluginManager.h"
 #include "lldb/Core/Section.h"
 #include "lldb/Core/Stream.h"
+#include "lldb/Core/Timer.h"
 #include "lldb/Symbol/DWARFCallFrameInfo.h"
 #include "lldb/Symbol/SymbolContext.h"
 #include "lldb/Target/SectionLoadList.h"
@@ -231,6 +232,10 @@ ELFNote::Parse(const DataExtractor &data, lldb::offset_t *offset)
     return true;
 }
 
+// Arbitrary constant used as UUID prefix for core files.
+const uint32_t
+ObjectFileELF::g_core_uuid_magic(0xE210C);
+
 //------------------------------------------------------------------
 // Static methods.
 //------------------------------------------------------------------
@@ -348,7 +353,7 @@ ObjectFileELF::MagicBytesMatch (DataBufferSP& data_sp,
  *   code or tables extracted from it, as desired without restriction.
  */
 static uint32_t
-calc_gnu_debuglink_crc32(const void *buf, size_t size)
+calc_crc32(uint32_t crc, const void *buf, size_t size)
 {
     static const uint32_t g_crc32_tab[] =
     {
@@ -397,14 +402,51 @@ calc_gnu_debuglink_crc32(const void *buf, size_t size)
         0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
     };    
     const uint8_t *p = (const uint8_t *)buf;
-    uint32_t crc;
 
-    crc = ~0U;
+    crc = crc ^ ~0U;
     while (size--)
         crc = g_crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
     return crc ^ ~0U;
 }
 
+static uint32_t
+calc_gnu_debuglink_crc32(const void *buf, size_t size)
+{
+    return calc_crc32(0U, buf, size);
+}
+
+uint32_t
+ObjectFileELF::CalculateELFNotesSegmentsCRC32 (const ProgramHeaderColl& program_headers,
+                                               DataExtractor& object_data)
+{
+    typedef ProgramHeaderCollConstIter Iter;
+
+    uint32_t core_notes_crc = 0;
+
+    for (Iter I = program_headers.begin(); I != program_headers.end(); ++I)
+    {
+        if (I->p_type == llvm::ELF::PT_NOTE)
+        {
+            const elf_off ph_offset = I->p_offset;
+            const size_t ph_size = I->p_filesz;
+
+            DataExtractor segment_data;
+            if (segment_data.SetData(object_data, ph_offset, ph_size) != ph_size)
+            {
+                // The ELF program header contained incorrect data,
+                // prabably corefile is incomplete or corrupted.
+                break;
+            }
+
+            core_notes_crc = calc_crc32(core_notes_crc,
+                                        segment_data.GetDataStart(),
+                                        segment_data.GetByteSize());
+        }
+    }
+
+    return core_notes_crc;
+}
+
 size_t
 ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                                         lldb::DataBufferSP& data_sp,
@@ -454,14 +496,54 @@ ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                     lldb_private::UUID &uuid = spec.GetUUID();
                     GetSectionHeaderInfo(section_headers, data, header, uuid, gnu_debuglink_file, gnu_debuglink_crc);
 
+
                     if (!uuid.IsValid())
                     {
+                        uint32_t core_notes_crc = 0;
+
                         if (!gnu_debuglink_crc)
                         {
-                            // Need to map entire file into memory to calculate the crc.
-                            data_sp = file.MemoryMapFileContents (file_offset, SIZE_MAX);
-                            data.SetData(data_sp);
-                            gnu_debuglink_crc = calc_gnu_debuglink_crc32 (data.GetDataStart(), data.GetByteSize());
+                            lldb_private::Timer scoped_timer (__PRETTY_FUNCTION__,
+                                                              "Calculating module crc32 %s with size %" PRIu64 " KiB",
+                                                              file.GetLastPathComponent().AsCString(),
+                                                              (file.GetByteSize()-file_offset)/1024);
+
+                            // For core files - which usually don't happen to have a gnu_debuglink,
+                            // and are pretty bulky - calulating whole contents crc32 would be too much of luxury.
+                            // Thus we will need to fallback to something simpler.
+                            if (header.e_type == llvm::ELF::ET_CORE)
+                            {
+                                size_t program_headers_end = header.e_phoff + header.e_phnum * header.e_phentsize;
+                                if (program_headers_end > data_sp->GetByteSize())
+                                {
+                                    data_sp = file.MemoryMapFileContents(file_offset, program_headers_end);
+                                    data.SetData(data_sp);
+                                }
+                                ProgramHeaderColl program_headers;
+                                GetProgramHeaderInfo(program_headers, data, header);
+
+                                size_t segment_data_end = 0;
+                                for (ProgramHeaderCollConstIter I = program_headers.begin();
+                                     I != program_headers.end(); ++I)
+                                {
+                                     segment_data_end = std::max (I->p_offset + I->p_filesz, segment_data_end);
+                                }
+
+                                if (segment_data_end > data_sp->GetByteSize())
+                                {
+                                    data_sp = file.MemoryMapFileContents(file_offset, segment_data_end);
+                                    data.SetData(data_sp);
+                                }
+
+                                core_notes_crc = CalculateELFNotesSegmentsCRC32 (program_headers, data);
+                            }
+                            else
+                            {
+                                // Need to map entire file into memory to calculate the crc.
+                                data_sp = file.MemoryMapFileContents (file_offset, SIZE_MAX);
+                                data.SetData(data_sp);
+                                gnu_debuglink_crc = calc_gnu_debuglink_crc32 (data.GetDataStart(), data.GetByteSize());
+                            }
                         }
                         if (gnu_debuglink_crc)
                         {
@@ -469,6 +551,13 @@ ObjectFileELF::GetModuleSpecifications (const lldb_private::FileSpec& file,
                             uint32_t uuidt[4] = { gnu_debuglink_crc, 0, 0, 0 };
                             uuid.SetBytes (uuidt, sizeof(uuidt));
                         }
+                        else if (core_notes_crc)
+                        {
+                            // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it look different form
+                            // .gnu_debuglink crc followed by 4 bytes of note segments crc.
+                            uint32_t uuidt[4] = { g_core_uuid_magic, core_notes_crc, 0, 0 };
+                            uuid.SetBytes (uuidt, sizeof(uuidt));
+                        }
                     }
 
                     specs.Append(spec);
@@ -620,7 +709,7 @@ bool
 ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
 {
     // Need to parse the section list to get the UUIDs, so make sure that's been done.
-    if (!ParseSectionHeaders())
+    if (!ParseSectionHeaders() && GetType() != ObjectFile::eTypeCoreFile)
         return false;
 
     if (m_uuid.IsValid())
@@ -629,7 +718,28 @@ ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
         *uuid = m_uuid;
         return true;
     }
-    else 
+    else if (GetType() == ObjectFile::eTypeCoreFile)
+    {
+        uint32_t core_notes_crc = 0;
+
+        if (!ParseProgramHeaders())
+            return false;
+
+        core_notes_crc = CalculateELFNotesSegmentsCRC32(m_program_headers, m_data);
+
+        if (core_notes_crc)
+        {
+            // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it
+            // look different form .gnu_debuglink crc - followed by 4 bytes of note
+            // segments crc.
+            uint32_t uuidt[4] = { g_core_uuid_magic, core_notes_crc, 0, 0 };
+            // FIXME: Should we also update m_uuid at this point, so this
+            // calculation (possibly costly) is not repeated?
+            uuid->SetBytes (uuidt, sizeof(uuidt));
+            return true;
+        }
+    }
+    else
     {
         if (!m_gnu_debuglink_crc)
             m_gnu_debuglink_crc = calc_gnu_debuglink_crc32 (m_data.GetDataStart(), m_data.GetByteSize());
@@ -637,6 +747,8 @@ ObjectFileELF::GetUUID(lldb_private::UUID* uuid)
         {
             // Use 4 bytes of crc from the .gnu_debuglink section.
             uint32_t uuidt[4] = { m_gnu_debuglink_crc, 0, 0, 0 };
+            // FIXME: Should we also update m_uuid at this point, so this
+            // calculation (possibly costly) is not repeated.
             uuid->SetBytes (uuidt, sizeof(uuidt));
             return true;
         }
@@ -801,41 +913,53 @@ ObjectFileELF::ParseDependentModules()
 }
 
 //----------------------------------------------------------------------
-// ParseProgramHeaders
+// GetProgramHeaderInfo
 //----------------------------------------------------------------------
 size_t
-ObjectFileELF::ParseProgramHeaders()
+ObjectFileELF::GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
+                                    DataExtractor &object_data,
+                                    const ELFHeader &header)
 {
     // We have already parsed the program headers
-    if (!m_program_headers.empty())
-        return m_program_headers.size();
+    if (!program_headers.empty())
+        return program_headers.size();
 
     // If there are no program headers to read we are done.
-    if (m_header.e_phnum == 0)
+    if (header.e_phnum == 0)
         return 0;
 
-    m_program_headers.resize(m_header.e_phnum);
-    if (m_program_headers.size() != m_header.e_phnum)
+    program_headers.resize(header.e_phnum);
+    if (program_headers.size() != header.e_phnum)
         return 0;
 
-    const size_t ph_size = m_header.e_phnum * m_header.e_phentsize;
-    const elf_off ph_offset = m_header.e_phoff;
+    const size_t ph_size = header.e_phnum * header.e_phentsize;
+    const elf_off ph_offset = header.e_phoff;
     DataExtractor data;
-    if (GetData (ph_offset, ph_size, data) != ph_size)
+    if (data.SetData(object_data, ph_offset, ph_size) != ph_size)
         return 0;
 
     uint32_t idx;
     lldb::offset_t offset;
-    for (idx = 0, offset = 0; idx < m_header.e_phnum; ++idx)
+    for (idx = 0, offset = 0; idx < header.e_phnum; ++idx)
     {
-        if (m_program_headers[idx].Parse(data, &offset) == false)
+        if (program_headers[idx].Parse(data, &offset) == false)
             break;
     }
 
-    if (idx < m_program_headers.size())
-        m_program_headers.resize(idx);
+    if (idx < program_headers.size())
+        program_headers.resize(idx);
+
+    return program_headers.size();
+
+}
 
-    return m_program_headers.size();
+//----------------------------------------------------------------------
+// ParseProgramHeaders
+//----------------------------------------------------------------------
+size_t
+ObjectFileELF::ParseProgramHeaders()
+{
+    return GetProgramHeaderInfo(m_program_headers, m_data, m_header);
 }
 
 static bool
diff --git a/source/Plugins/ObjectFile/ELF/ObjectFileELF.h b/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
index 44b40a4..ac7d38c 100644
--- a/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
+++ b/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
@@ -214,6 +214,7 @@ private:
 
     /// Version of this reader common to all plugins based on this class.
     static const uint32_t m_plugin_version = 1;
+    static const uint32_t g_core_uuid_magic;
 
     /// ELF file header.
     elf::ELFHeader m_header;
@@ -249,6 +250,17 @@ private:
     size_t
     SectionIndex(const SectionHeaderCollConstIter &I) const;
 
+    // Parses the ELF program headers.
+    static size_t
+    GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
+                         lldb_private::DataExtractor &data,
+                         const elf::ELFHeader &header);
+
+    // Finds PT_NOTE segments and calculates their crc sum.
+    static uint32_t
+    CalculateELFNotesSegmentsCRC32(const ProgramHeaderColl& program_headers,
+                                   lldb_private::DataExtractor &data);
+
     /// Parses all section headers present in this object file and populates
     /// m_program_headers.  This method will compute the header list only once.
     /// Returns the number of headers parsed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Don-t-calculate-whole-file-crc-for-ELF-core-file_v2.patch
Type: text/x-patch
Size: 14038 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140323/5c1364a0/attachment.bin>


More information about the lldb-dev mailing list