[llvm-commits] [PATCH] ARM build attribute reading support

Michael Spencer bigcheesegs at gmail.com
Tue Nov 20 12:20:50 PST 2012


On Tue, Nov 20, 2012 at 7:17 AM, Amara Emerson <amara.emerson at arm.com> wrote:
> Thanks for the review. I believe I've addressed your concerns with the new patch attached. Renato covered the point regarding the endianness.
>
> To summarise to main changes:
>  - Macros used to shorten the repetitive switch cases.

Thanks, this helps a lot.

>  - Variable/function renames.
>  - New function to check if an attributes section exists in the file, used in readobj.
>  - Slightly reworked parser code due to not using Elf_Word.

Sorry for not being clear. I mean not to use Elf_Word as a local
variable. As in:

Elf_Word blah = 4;

It is correct to use it as *(Elf_Word*). Using *(uint32_t*) is wrong
because it doesn't handle endianness and alignment properly.

>  - The public reader function has been changed to return an appropriate error_code instead of aborting with a fatal error.
>  - Use of LLVM style RTTI for ELFObjectFile cast in llvm-readobj, eliminating the need for the MemoryBuffer.
>
> Amara

Also doxygen comments should use \ instead of @.

Other than that it looks good. But wait for Anton to review.

- Michael Spencer

>
> -----Original Message-----
> From: Michael Spencer [mailto:bigcheesegs at gmail.com]
> Sent: 19 November 2012 20:38
> To: Amara Emerson
> Cc: Renato Golin; llvm-commits
> Subject: Re: [llvm-commits] [PATCH] ARM build attribute reading support
>
> On Mon, Nov 19, 2012 at 7:39 AM, Amara Emerson <amara.emerson at arm.com> wrote:
>> I've attached a new patch with some changes. Note that the patch is a git
>> binary diff because of the new test object files, but I can supply separate
>> patch + files if anyone wants me to.
>>
>> They are:
>>  - Separate enum anonymous namespaces.
>>  - Public getters/setters have been removed completely to make the class
>> mostly just a data container.
>>  - A new large test case added which exercises the previously non-tested
>> attributes. llvm-readobj behaviour is congruent with readelf.
>>
>> Michael, does this look reasonable?
>>
>> Thanks,
>> Amara
>
>> +++ b/include/llvm/Object/ELF.h
>> @@ -20,12 +20,15 @@
>>  #include "llvm/ADT/DenseMap.h"
>>  #include "llvm/ADT/PointerIntPair.h"
>>  #include "llvm/Object/ObjectFile.h"
>> +#include "llvm/Object/ELF_ARM.h"
>
> Includes are ordered alphabetically.
>
>>  #include "llvm/Support/Casting.h"
>>  #include "llvm/Support/ELF.h"
>>  #include "llvm/Support/Endian.h"
>> +#include "llvm/Support/LEB128.h"
>>  #include "llvm/Support/ErrorHandling.h"
>>  #include "llvm/Support/MemoryBuffer.h"
>>  #include "llvm/Support/raw_ostream.h"
>> +#include "llvm/Support/Debug.h"
>>  #include <algorithm>
>>  #include <limits>
>>  #include <utility>
>> @@ -486,6 +489,7 @@ private:
>>    const Elf_Shdr *dot_gnu_version_sec;   // .gnu.version
>>    const Elf_Shdr *dot_gnu_version_r_sec; // .gnu.version_r
>>    const Elf_Shdr *dot_gnu_version_d_sec; // .gnu.version_d
>> +  const Elf_Shdr *dot_arm_attributes_sec;// .ARM.attributes
>>
>>    // Pointer to SONAME entry in dynamic string table
>>    // This is set the first time getLoadName is called.
>> @@ -596,6 +600,10 @@ private:
>>                                     bool &IsDefault) const;
>>    void VerifyStrTab(const Elf_Shdr *sh) const;
>>
>> +  uintptr_t ARMReadSingleAttribute(uintptr_t readPtr,
>> +                               ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
>> +                               SmallVector<unsigned, 16> &TagsSet);
>
> readPtr should be ReadPtr. Also function names start with lower case.
> readSingleARMAttribute or something similar.
>
>> +
>>  protected:
>>    const Elf_Sym  *getSymbol(DataRefImpl Symb) const; // FIXME: Should be private?
>>    void            validateSymbol(DataRefImpl Symb) const;
>> @@ -725,6 +733,8 @@ public:
>>      return v->getType() == getELFType(target_endianness == support::little,
>>                                        is64Bits);
>>    }
>> +  void ARMReadBuildAttributes(ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
>> +                              SmallVector<unsigned, 16> &TagsSet);
>
> Same here. Also, this needs documentation.
>
>>  };
>>
>>  // Iterate through the version definitions, and place each Elf_Verdef
>> @@ -1948,6 +1958,7 @@ ELFObjectFile<target_endianness, is64Bits>::ELFObjectFile(MemoryBuffer *Object
>>    , dot_gnu_version_sec(0)
>>    , dot_gnu_version_r_sec(0)
>>    , dot_gnu_version_d_sec(0)
>> +  , dot_arm_attributes_sec(0)
>>    , dt_soname(0)
>>   {
>>
>> @@ -2039,6 +2050,15 @@ ELFObjectFile<target_endianness, is64Bits>::ELFObjectFile(MemoryBuffer *Object
>>        dot_gnu_version_r_sec = sh;
>>        break;
>>      }
>> +    case ELF::SHT_ARM_ATTRIBUTES: {
>> +      if (dot_arm_attributes_sec != NULL) {
>> +        // FIXME: Proper error handling
>> +        report_fatal_error("More than one .ARM.attributes section!");
>> +        break;
>> +      }
>> +      dot_arm_attributes_sec = sh;
>> +      break;
>> +    }
>>      }
>>      ++sh;
>>    }
>> @@ -2673,7 +2693,310 @@ static inline error_code GetELFSymbolVersion(const ObjectFile *Obj,
>>    llvm_unreachable("Object passed to GetELFSymbolVersion() is not ELF");
>>  }
>>
>> +template<support::endianness target_endianness, bool is64Bits>
>> +void ELFObjectFile<target_endianness, is64Bits>::
>> +       ARMReadBuildAttributes(ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
>> +                              SmallVector<unsigned, 16> &TagsSet) {
>> +
>> +  if (getArch() != Triple::arm)
>> +    report_fatal_error("Cannot read ARM build attributes of a non-ARM file.");
>> +
>> +  if (!dot_arm_attributes_sec)
>> +    report_fatal_error("No ARM attribute section found");
>
> There needs to be a way to check if calling this function is ok
> without crashing.
>
>> +
>
> All the variable names in the following need to follow the LLVM style
> guide. I know
> a lot of this file doesn't, but I'd like added code to follow llvm
> style. The only
> exception to this are names directly derived from the ELF specification.
>
>> +  const char *sec_start = (const char*)base() +
>> +                          dot_arm_attributes_sec->sh_offset;
>> +  const char *sec_end = sec_start + dot_arm_attributes_sec->sh_size;
>> +  char format_version = *sec_start;
>> +  if (format_version != ARMBuildAttrs::Format_Version)
>> +    report_fatal_error("ARM build attribute format version not recognised.");
>> +
>> +  uintptr_t sSectionPtr = (uintptr_t)sec_start + 1;
>> +  uintptr_t readPtr = sSectionPtr;
>> +  // Begin reading section after format-version byte
>> +  while (readPtr < (uintptr_t)sec_end) {
>> +    // Read a subsection, starting with: <section-length> "vendor-name"
>> +    // For now, we only care about the "aeabi" pseudo-vendor subsection.
>> +    Elf_Word sSectionLen = *(Elf_Word*)readPtr;
>
> Don't use Elf_Word as a local variable. Use uint32_t.
>
>> +    readPtr += sizeof(Elf_Word);
>> +    StringRef vendor((char*)(readPtr));
>> +    readPtr += vendor.size() + 1; // Vendor string + NUL byte
>> +
>> +    if (vendor != "aeabi") {
>> +      sSectionPtr += sSectionLen;
>> +      readPtr = sSectionPtr;
>> +      continue; //skip to the next sub-section
>> +    }
>> +
>> +    bool foundFileTag = false;
>> +    uintptr_t ssSectionPtr = readPtr;
>> +    Elf_Word ssSectionLen = *(Elf_Word*)readPtr;
>> +    // Found aeabi subsection, now find the File scope tag.
>> +    do {
>> +      ssSectionPtr = readPtr;
>> +      unsigned n = 0;
>> +      uint64_t tag = decodeULEB128((uint8_t*)readPtr, &n);
>
> Is ULEB byte oriented? Or does endianess matter?
>
>> +      readPtr += n;
>> +      ssSectionLen = *(Elf_Word*)readPtr;
>> +      if (tag != ARMBuildAttrs::File) {
>> +        // We only handle File scope attributes, skip ahead to the next
>> +        // sub-subsection.
>> +        ssSectionPtr += ssSectionLen;
>> +        readPtr = ssSectionPtr;
>> +        continue;
>> +      }
>> +      foundFileTag = true;
>> +      break; // Found the file tag.
>> +    } while (readPtr < sSectionPtr + sSectionLen);
>> +
>> +    if (!foundFileTag)
>> +      report_fatal_error("Could not find ELF file scope attribute tag.");
>> +
>> +    readPtr += sizeof(Elf_Word);
>> +    while (readPtr < ssSectionPtr + ssSectionLen) {
>> +      // Read any number of attributes.
>> +      // Attributes are pairs of <uleb128, uleb128> or <uleb128, NTBS>.
>> +      readPtr = ARMReadSingleAttribute(readPtr, Attrs, TagsSet);
>> +    }
>> +    Attrs.setValid(true);
>> +  }
>>  }
>> +
>> +// Helper to read a single ARM attribute at the given pointer and return the
>> +// read pointer moved forward to the next attribute location.
>> +template<support::endianness target_endianness, bool is64Bits>
>> +uintptr_t ELFObjectFile<target_endianness, is64Bits>::
>> +                       ARMReadSingleAttribute(uintptr_t readPtr,
>> +                           ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
>> +                           SmallVector<unsigned, 16> &TagsSet) {
>> +  // The ABI says that tags in the range 0-63 must be handled by tools.
>> +  unsigned n = 0;
>> +  uint64_t tagInt = decodeULEB128((uint8_t*)readPtr, &n);
>> +  ARMBuildAttrs::AttrType tag = (ARMBuildAttrs::AttrType)tagInt;
>> +  readPtr += n;
>> +  uint64_t ulebValue = 0;
>> +  StringRef strValue;
>> +  switch (tag) {
>> +    case ARMBuildAttrs::CPU_arch: // uleb128
>
> Case labels shouldn't be indented.
>
>> +      ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
>> +      readPtr += n;
>> +      Attrs.Tag_CPU_arch = (ARMBuildAttrs::CPUArch)ulebValue;
>> +      TagsSet.push_back(tag);
>> +      break;
>
> [snip]
>
>> +    case ARMBuildAttrs::ABI_VFP_args: // uleb128
>> +      ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
>> +      readPtr += n;
>> +      Attrs.Tag_ABI_VFP_args = ulebValue;
>> +      TagsSet.push_back(tag);
>> +      break;
>> +    default:
>> +      // Unhandled build attribute tag, according to the spec we should be able
>> +      // to infer the type of the value from (tag % 2) and skip over it.
>> +      if (tag & 0x1) {
>> +        // Value should be a null terminated byte string
>> +        strValue = (char*)readPtr;
>> +        readPtr += strValue.size() + 1;
>> +      }
>
> No newline before else.
>
>> +      else {
>> +        // Value should be a uleb128
>> +        ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
>> +        readPtr += n;
>> +      }
>> +      break;
>> +  }
>> +  return readPtr;
>>  }
>>
>> +} // object
>> +} // llvm
>> +
>>  #endif
>> diff --git a/include/llvm/Object/ELF_ARM.h b/include/llvm/Object/ELF_ARM.h
>> new file mode 100644
>> index 0000000..1cb1226
>> --- /dev/null
>> +++ b/include/llvm/Object/ELF_ARM.h
>> @@ -0,0 +1,332 @@
>> +//===-- ELF_ARM.h - ARM ELF ABI ---------------------------------*- C++ -*-===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// This file contains enumerations and support routines for ARM build attributes
>> +// as defined in ARM ABI addenda document (ABI release 2.08).
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#ifndef LLVM_OBJECT_ELF_ARM_H
>> +#define LLVM_OBJECT_ELF_ARM_H
>> +
>> +#include "llvm/ADT/SmallVector.h"
>> +#include "llvm/Support/Endian.h"
>> +
>> +namespace llvm {
>> +
>> +namespace ARMBuildAttrs {
>
> The contents of namespace this large should not be indented.
>
>> +  enum SpecialAttr {
>> +    // This is for the .cpu asm attr. It translates into one or more
>> +    // AttrType (below) entries in the .ARM.attributes section in the ELF.
>> +    SEL_CPU
>> +  };
>> +
>
> [snip]
>
>> +
>> +  class ARMGenericBuildAttrInfo {
>> +public:
>
> public should be at the same indentation as class.
>
>> +    ARMGenericBuildAttrInfo()
>> +      : Valid(false) {}
>> +
>> +    CPUArch Tag_CPU_arch;
>> +    CPUArchProfile Tag_CPU_arch_profile;
>> +    std::string Tag_CPU_raw_name;
>
> [snip]
>
>> diff --git a/tools/llvm-readobj/llvm-readobj.cpp b/tools/llvm-readobj/llvm-readobj.cpp
>> index 3be1289..afe3d54 100644
>> --- a/tools/llvm-readobj/llvm-readobj.cpp
>> +++ b/tools/llvm-readobj/llvm-readobj.cpp
>> @@ -184,6 +184,155 @@ void DumpHeaders(const ObjectFile *obj) {
>>    outs() << "\n";
>>  }
>>
>> +void printARMBuildAttributes(ObjectFile *obj, MemoryBuffer *Buffer) {
>> +  ARMBuildAttrs::ARMGenericBuildAttrInfo BuildAttrs;
>> +  SmallVector<unsigned, 16> AttrsRead;
>> +  // Check the ELF header for endianness and word size class.
>> +  char type = (uint8_t)Buffer->getBufferStart()[ELF::EI_CLASS];
>> +  char endian = (uint8_t)Buffer->getBufferStart()[ELF::EI_DATA];
>
> This information is available from ELFObjectFile.
>
>> +
>> +  if (endian == ELF::ELFDATA2LSB && type == ELF::ELFCLASS32)
>> +    static_cast<ELFObjectFile<support::little, false> *>(obj)
>> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
>> +  else if (endian == ELF::ELFDATA2LSB && type == ELF::ELFCLASS64)
>> +    static_cast<ELFObjectFile<support::little, true> *>(obj)
>> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
>> +  else if (endian == ELF::ELFDATA2MSB && type == ELF::ELFCLASS32)
>> +    static_cast<ELFObjectFile<support::big, false> *>(obj)
>> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
>> +  else
>> +    static_cast<ELFObjectFile<support::big, true> *>(obj)
>> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
>
> There's no such thing as bigendian ARM, right?
>
>> +
>> +  outs() << "ARM file attributes:\n";
>> +  for (SmallVector<unsigned, 16>::iterator I =AttrsRead.begin(),
>
> Space after =.
>
>> +       E = AttrsRead.end(); I != E; ++I) {
>> +    switch (*I) {
>> +      case ARMBuildAttrs::CPU_name:
>
> Case labels should be indented to the same level as switch.
>
>> +        outs() << "  Tag_CPU_name: " << BuildAttrs.Tag_CPU_name << "\n";
>> +        break;
>
> [snip]
>
>> +      case ARMBuildAttrs::ABI_FP_optimization_goals:
>> +        outs() << "  Tag_ABI_FP_optimization_goals: " <<
>> +                  BuildAttrs.Tag_ABI_FP_optimization_goals << "\n";
>> +        break;
>> +      default:
>> +        break;
>
> This looks like it could be shrunk a lot with a simple macro.
>
>> +    }
>> +  }
>> +}
>> +
>>  int main(int argc, char** argv) {
>>    error_code ec;
>>    sys::PrintStackTraceOnErrorSignal();
>> @@ -204,7 +353,7 @@ int main(int argc, char** argv) {
>>      return 1;
>>    }
>>
>> -  ObjectFile *obj = ObjectFile::createObjectFile(File.take());
>> +  ObjectFile *obj = ObjectFile::createObjectFile(File.get());
>
> This is wrong. ObjectFile takes ownership of the MemoryBuffer. This
> will double free.
>
>>    if (!obj) {
>>      errs() << InputFilename << ": Object type not recognized\n";
>>    }
>> @@ -213,6 +362,10 @@ int main(int argc, char** argv) {
>>    DumpSymbols(obj);
>>    DumpDynamicSymbols(obj);
>>    DumpLibrariesNeeded(obj);
>> +
>> +  if (obj->getArch() == Triple::arm && obj->isELF())
>> +    printARMBuildAttributes(obj, File.get());
>> +
>
> With the changes I mentioned above there's no need to pass the memory buffer in.
>
>>    return 0;
>>  }
>>
>
> I have no idea how ARM attributes are formatted so I'm not sure about
> that part. But overall this patch seems very fragile and verbose.
>
> - Michael Spencer
>
>
>> -----Original Message-----
>> From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of Renato
>> Golin
>> Sent: 16 November 2012 08:53
>> To: Amara Emerson
>> Cc: llvm-commits at cs.uiuc.edu for LLVM
>> Subject: Re: [llvm-commits] [PATCH] ARM build attribute reading support
>>
>> On 15 November 2012 18:18, Amara Emerson <amara.emerson at arm.com> wrote:
>>> Thanks for the comments. Yes, I realise the tests are a bit light at the
>>> moment. I was hoping to get some general functionality comments as it
>> would
>>> probably take a few hundred lines to test each attribute type (and bloat
>> the
>>> llvm-readobj tool somewhat). If the general structure looks good I can add
>>> more tests.
>>
>> Should be enough to have a few objects with different configurations
>> for the attribute table on a test dir. And maybe one with a giant
>> table with all possible arguments in it.
>>
>>
>>>> Does big.LITTLE have its own value, here?
>>>
>>> No this is for the current ARM ELF ABI spec, which has a defined list of
>>> architecture types.
>>
>> I know, but since big.LITTLE will certainly need new thoughts on how
>> to optimise and especially how to select libraries (probably something
>> like gnu's ifunc etc), would be good to have it in, if ARM already
>> have something public towards that.
>>
>>
>>> I don't have much of a preference really. The header file was an extended
>>> version of the existing one in the ARM backend so I continued with the
>> enum
>>> style already used.
>>
>> I know, much of it was in the old file.
>>
>>
>>> I used getters/setters because I wasn't sure on how to do error handling
>> if
>>> an invalid or strange attribute value was read. libObject seems different
>> to
>>> the rest of LLVM in this sense. E.g. should reading an invalid/malformed
>>> attribute value result in an fatal error? If not, how should it warn or be
>>> handled silently. If anyone who knows it better than me can give tips here
>>> it'd be welcome.
>>
>> My personal feeling is that a malformed attribute table should be
>> treated as an error, and you should not have setters in the class.
>>
>> So, instead of transforming that into a POD, I'd leave it all private,
>> implement only the getters and let only that function create it
>> (either by marking its class as friend or by having a way to construct
>> it with all the information at once - maybe a factory).
>>
>> The point is that we may want to merge different tables, but we don't
>> want to silently change any of the original tables, and each one must
>> reflect exactly what was in the object file. The factory mentioned
>> above could also hold the merging rules (when constructing from an
>> array of tables, for instance), if that proves necessary in the
>> future.
>>
>> --
>> cheers,
>> --renato
>>
>> http://systemcall.org/
>
> The goal of error handling in LLVMObject is to allow as much laziness
> as possible while not crashing on any invalid files. The ELF reader
> currently doesn't do this very well, however the COFF reader has quite
> a few examples of how this is handled. I admit that it's quite verbose
> but the alternative is either silently dropping data or crashing on
> parse errors :( (If someone has a better solution I'd love to see it)
>
> This patch really should be written to correctly handle parsing errors.
>
> - Michael Spencer



More information about the llvm-commits mailing list