[llvm-commits] [PATCH] ARM build attribute reading support

Amara Emerson amara.emerson at arm.com
Tue Nov 20 07:17:58 PST 2012


Thanks for the review. I believe I've addressed your concerns with the new patch attached. Renato covered the point regarding the endianness.

To summarise to main changes:
 - Macros used to shorten the repetitive switch cases.
 - Variable/function renames.
 - New function to check if an attributes section exists in the file, used in readobj.
 - Slightly reworked parser code due to not using Elf_Word.
 - The public reader function has been changed to return an appropriate error_code instead of aborting with a fatal error.
 - Use of LLVM style RTTI for ELFObjectFile cast in llvm-readobj, eliminating the need for the MemoryBuffer.

Amara

-----Original Message-----
From: Michael Spencer [mailto:bigcheesegs at gmail.com] 
Sent: 19 November 2012 20:38
To: Amara Emerson
Cc: Renato Golin; llvm-commits
Subject: Re: [llvm-commits] [PATCH] ARM build attribute reading support

On Mon, Nov 19, 2012 at 7:39 AM, Amara Emerson <amara.emerson at arm.com> wrote:
> I've attached a new patch with some changes. Note that the patch is a git
> binary diff because of the new test object files, but I can supply separate
> patch + files if anyone wants me to.
>
> They are:
>  - Separate enum anonymous namespaces.
>  - Public getters/setters have been removed completely to make the class
> mostly just a data container.
>  - A new large test case added which exercises the previously non-tested
> attributes. llvm-readobj behaviour is congruent with readelf.
>
> Michael, does this look reasonable?
>
> Thanks,
> Amara

> +++ b/include/llvm/Object/ELF.h
> @@ -20,12 +20,15 @@
>  #include "llvm/ADT/DenseMap.h"
>  #include "llvm/ADT/PointerIntPair.h"
>  #include "llvm/Object/ObjectFile.h"
> +#include "llvm/Object/ELF_ARM.h"

Includes are ordered alphabetically.

>  #include "llvm/Support/Casting.h"
>  #include "llvm/Support/ELF.h"
>  #include "llvm/Support/Endian.h"
> +#include "llvm/Support/LEB128.h"
>  #include "llvm/Support/ErrorHandling.h"
>  #include "llvm/Support/MemoryBuffer.h"
>  #include "llvm/Support/raw_ostream.h"
> +#include "llvm/Support/Debug.h"
>  #include <algorithm>
>  #include <limits>
>  #include <utility>
> @@ -486,6 +489,7 @@ private:
>    const Elf_Shdr *dot_gnu_version_sec;   // .gnu.version
>    const Elf_Shdr *dot_gnu_version_r_sec; // .gnu.version_r
>    const Elf_Shdr *dot_gnu_version_d_sec; // .gnu.version_d
> +  const Elf_Shdr *dot_arm_attributes_sec;// .ARM.attributes
>
>    // Pointer to SONAME entry in dynamic string table
>    // This is set the first time getLoadName is called.
> @@ -596,6 +600,10 @@ private:
>                                     bool &IsDefault) const;
>    void VerifyStrTab(const Elf_Shdr *sh) const;
>
> +  uintptr_t ARMReadSingleAttribute(uintptr_t readPtr,
> +                               ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
> +                               SmallVector<unsigned, 16> &TagsSet);

readPtr should be ReadPtr. Also function names start with lower case.
readSingleARMAttribute or something similar.

> +
>  protected:
>    const Elf_Sym  *getSymbol(DataRefImpl Symb) const; // FIXME: Should be private?
>    void            validateSymbol(DataRefImpl Symb) const;
> @@ -725,6 +733,8 @@ public:
>      return v->getType() == getELFType(target_endianness == support::little,
>                                        is64Bits);
>    }
> +  void ARMReadBuildAttributes(ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
> +                              SmallVector<unsigned, 16> &TagsSet);

Same here. Also, this needs documentation.

>  };
>
>  // Iterate through the version definitions, and place each Elf_Verdef
> @@ -1948,6 +1958,7 @@ ELFObjectFile<target_endianness, is64Bits>::ELFObjectFile(MemoryBuffer *Object
>    , dot_gnu_version_sec(0)
>    , dot_gnu_version_r_sec(0)
>    , dot_gnu_version_d_sec(0)
> +  , dot_arm_attributes_sec(0)
>    , dt_soname(0)
>   {
>
> @@ -2039,6 +2050,15 @@ ELFObjectFile<target_endianness, is64Bits>::ELFObjectFile(MemoryBuffer *Object
>        dot_gnu_version_r_sec = sh;
>        break;
>      }
> +    case ELF::SHT_ARM_ATTRIBUTES: {
> +      if (dot_arm_attributes_sec != NULL) {
> +        // FIXME: Proper error handling
> +        report_fatal_error("More than one .ARM.attributes section!");
> +        break;
> +      }
> +      dot_arm_attributes_sec = sh;
> +      break;
> +    }
>      }
>      ++sh;
>    }
> @@ -2673,7 +2693,310 @@ static inline error_code GetELFSymbolVersion(const ObjectFile *Obj,
>    llvm_unreachable("Object passed to GetELFSymbolVersion() is not ELF");
>  }
>
> +template<support::endianness target_endianness, bool is64Bits>
> +void ELFObjectFile<target_endianness, is64Bits>::
> +       ARMReadBuildAttributes(ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
> +                              SmallVector<unsigned, 16> &TagsSet) {
> +
> +  if (getArch() != Triple::arm)
> +    report_fatal_error("Cannot read ARM build attributes of a non-ARM file.");
> +
> +  if (!dot_arm_attributes_sec)
> +    report_fatal_error("No ARM attribute section found");

There needs to be a way to check if calling this function is ok
without crashing.

> +

All the variable names in the following need to follow the LLVM style
guide. I know
a lot of this file doesn't, but I'd like added code to follow llvm
style. The only
exception to this are names directly derived from the ELF specification.

> +  const char *sec_start = (const char*)base() +
> +                          dot_arm_attributes_sec->sh_offset;
> +  const char *sec_end = sec_start + dot_arm_attributes_sec->sh_size;
> +  char format_version = *sec_start;
> +  if (format_version != ARMBuildAttrs::Format_Version)
> +    report_fatal_error("ARM build attribute format version not recognised.");
> +
> +  uintptr_t sSectionPtr = (uintptr_t)sec_start + 1;
> +  uintptr_t readPtr = sSectionPtr;
> +  // Begin reading section after format-version byte
> +  while (readPtr < (uintptr_t)sec_end) {
> +    // Read a subsection, starting with: <section-length> "vendor-name"
> +    // For now, we only care about the "aeabi" pseudo-vendor subsection.
> +    Elf_Word sSectionLen = *(Elf_Word*)readPtr;

Don't use Elf_Word as a local variable. Use uint32_t.

> +    readPtr += sizeof(Elf_Word);
> +    StringRef vendor((char*)(readPtr));
> +    readPtr += vendor.size() + 1; // Vendor string + NUL byte
> +
> +    if (vendor != "aeabi") {
> +      sSectionPtr += sSectionLen;
> +      readPtr = sSectionPtr;
> +      continue; //skip to the next sub-section
> +    }
> +
> +    bool foundFileTag = false;
> +    uintptr_t ssSectionPtr = readPtr;
> +    Elf_Word ssSectionLen = *(Elf_Word*)readPtr;
> +    // Found aeabi subsection, now find the File scope tag.
> +    do {
> +      ssSectionPtr = readPtr;
> +      unsigned n = 0;
> +      uint64_t tag = decodeULEB128((uint8_t*)readPtr, &n);

Is ULEB byte oriented? Or does endianess matter?

> +      readPtr += n;
> +      ssSectionLen = *(Elf_Word*)readPtr;
> +      if (tag != ARMBuildAttrs::File) {
> +        // We only handle File scope attributes, skip ahead to the next
> +        // sub-subsection.
> +        ssSectionPtr += ssSectionLen;
> +        readPtr = ssSectionPtr;
> +        continue;
> +      }
> +      foundFileTag = true;
> +      break; // Found the file tag.
> +    } while (readPtr < sSectionPtr + sSectionLen);
> +
> +    if (!foundFileTag)
> +      report_fatal_error("Could not find ELF file scope attribute tag.");
> +
> +    readPtr += sizeof(Elf_Word);
> +    while (readPtr < ssSectionPtr + ssSectionLen) {
> +      // Read any number of attributes.
> +      // Attributes are pairs of <uleb128, uleb128> or <uleb128, NTBS>.
> +      readPtr = ARMReadSingleAttribute(readPtr, Attrs, TagsSet);
> +    }
> +    Attrs.setValid(true);
> +  }
>  }
> +
> +// Helper to read a single ARM attribute at the given pointer and return the
> +// read pointer moved forward to the next attribute location.
> +template<support::endianness target_endianness, bool is64Bits>
> +uintptr_t ELFObjectFile<target_endianness, is64Bits>::
> +                       ARMReadSingleAttribute(uintptr_t readPtr,
> +                           ARMBuildAttrs::ARMGenericBuildAttrInfo &Attrs,
> +                           SmallVector<unsigned, 16> &TagsSet) {
> +  // The ABI says that tags in the range 0-63 must be handled by tools.
> +  unsigned n = 0;
> +  uint64_t tagInt = decodeULEB128((uint8_t*)readPtr, &n);
> +  ARMBuildAttrs::AttrType tag = (ARMBuildAttrs::AttrType)tagInt;
> +  readPtr += n;
> +  uint64_t ulebValue = 0;
> +  StringRef strValue;
> +  switch (tag) {
> +    case ARMBuildAttrs::CPU_arch: // uleb128

Case labels shouldn't be indented.

> +      ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
> +      readPtr += n;
> +      Attrs.Tag_CPU_arch = (ARMBuildAttrs::CPUArch)ulebValue;
> +      TagsSet.push_back(tag);
> +      break;

[snip]

> +    case ARMBuildAttrs::ABI_VFP_args: // uleb128
> +      ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
> +      readPtr += n;
> +      Attrs.Tag_ABI_VFP_args = ulebValue;
> +      TagsSet.push_back(tag);
> +      break;
> +    default:
> +      // Unhandled build attribute tag, according to the spec we should be able
> +      // to infer the type of the value from (tag % 2) and skip over it.
> +      if (tag & 0x1) {
> +        // Value should be a null terminated byte string
> +        strValue = (char*)readPtr;
> +        readPtr += strValue.size() + 1;
> +      }

No newline before else.

> +      else {
> +        // Value should be a uleb128
> +        ulebValue = decodeULEB128((uint8_t*)readPtr, &n);
> +        readPtr += n;
> +      }
> +      break;
> +  }
> +  return readPtr;
>  }
>
> +} // object
> +} // llvm
> +
>  #endif
> diff --git a/include/llvm/Object/ELF_ARM.h b/include/llvm/Object/ELF_ARM.h
> new file mode 100644
> index 0000000..1cb1226
> --- /dev/null
> +++ b/include/llvm/Object/ELF_ARM.h
> @@ -0,0 +1,332 @@
> +//===-- ELF_ARM.h - ARM ELF ABI ---------------------------------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file contains enumerations and support routines for ARM build attributes
> +// as defined in ARM ABI addenda document (ABI release 2.08).
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_OBJECT_ELF_ARM_H
> +#define LLVM_OBJECT_ELF_ARM_H
> +
> +#include "llvm/ADT/SmallVector.h"
> +#include "llvm/Support/Endian.h"
> +
> +namespace llvm {
> +
> +namespace ARMBuildAttrs {

The contents of namespace this large should not be indented.

> +  enum SpecialAttr {
> +    // This is for the .cpu asm attr. It translates into one or more
> +    // AttrType (below) entries in the .ARM.attributes section in the ELF.
> +    SEL_CPU
> +  };
> +

[snip]

> +
> +  class ARMGenericBuildAttrInfo {
> +public:

public should be at the same indentation as class.

> +    ARMGenericBuildAttrInfo()
> +      : Valid(false) {}
> +
> +    CPUArch Tag_CPU_arch;
> +    CPUArchProfile Tag_CPU_arch_profile;
> +    std::string Tag_CPU_raw_name;

[snip]

> diff --git a/tools/llvm-readobj/llvm-readobj.cpp b/tools/llvm-readobj/llvm-readobj.cpp
> index 3be1289..afe3d54 100644
> --- a/tools/llvm-readobj/llvm-readobj.cpp
> +++ b/tools/llvm-readobj/llvm-readobj.cpp
> @@ -184,6 +184,155 @@ void DumpHeaders(const ObjectFile *obj) {
>    outs() << "\n";
>  }
>
> +void printARMBuildAttributes(ObjectFile *obj, MemoryBuffer *Buffer) {
> +  ARMBuildAttrs::ARMGenericBuildAttrInfo BuildAttrs;
> +  SmallVector<unsigned, 16> AttrsRead;
> +  // Check the ELF header for endianness and word size class.
> +  char type = (uint8_t)Buffer->getBufferStart()[ELF::EI_CLASS];
> +  char endian = (uint8_t)Buffer->getBufferStart()[ELF::EI_DATA];

This information is available from ELFObjectFile.

> +
> +  if (endian == ELF::ELFDATA2LSB && type == ELF::ELFCLASS32)
> +    static_cast<ELFObjectFile<support::little, false> *>(obj)
> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
> +  else if (endian == ELF::ELFDATA2LSB && type == ELF::ELFCLASS64)
> +    static_cast<ELFObjectFile<support::little, true> *>(obj)
> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
> +  else if (endian == ELF::ELFDATA2MSB && type == ELF::ELFCLASS32)
> +    static_cast<ELFObjectFile<support::big, false> *>(obj)
> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);
> +  else
> +    static_cast<ELFObjectFile<support::big, true> *>(obj)
> +      ->ARMReadBuildAttributes(BuildAttrs, AttrsRead);

There's no such thing as bigendian ARM, right?

> +
> +  outs() << "ARM file attributes:\n";
> +  for (SmallVector<unsigned, 16>::iterator I =AttrsRead.begin(),

Space after =.

> +       E = AttrsRead.end(); I != E; ++I) {
> +    switch (*I) {
> +      case ARMBuildAttrs::CPU_name:

Case labels should be indented to the same level as switch.

> +        outs() << "  Tag_CPU_name: " << BuildAttrs.Tag_CPU_name << "\n";
> +        break;

[snip]

> +      case ARMBuildAttrs::ABI_FP_optimization_goals:
> +        outs() << "  Tag_ABI_FP_optimization_goals: " <<
> +                  BuildAttrs.Tag_ABI_FP_optimization_goals << "\n";
> +        break;
> +      default:
> +        break;

This looks like it could be shrunk a lot with a simple macro.

> +    }
> +  }
> +}
> +
>  int main(int argc, char** argv) {
>    error_code ec;
>    sys::PrintStackTraceOnErrorSignal();
> @@ -204,7 +353,7 @@ int main(int argc, char** argv) {
>      return 1;
>    }
>
> -  ObjectFile *obj = ObjectFile::createObjectFile(File.take());
> +  ObjectFile *obj = ObjectFile::createObjectFile(File.get());

This is wrong. ObjectFile takes ownership of the MemoryBuffer. This
will double free.

>    if (!obj) {
>      errs() << InputFilename << ": Object type not recognized\n";
>    }
> @@ -213,6 +362,10 @@ int main(int argc, char** argv) {
>    DumpSymbols(obj);
>    DumpDynamicSymbols(obj);
>    DumpLibrariesNeeded(obj);
> +
> +  if (obj->getArch() == Triple::arm && obj->isELF())
> +    printARMBuildAttributes(obj, File.get());
> +

With the changes I mentioned above there's no need to pass the memory buffer in.

>    return 0;
>  }
>

I have no idea how ARM attributes are formatted so I'm not sure about
that part. But overall this patch seems very fragile and verbose.

- Michael Spencer


> -----Original Message-----
> From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of Renato
> Golin
> Sent: 16 November 2012 08:53
> To: Amara Emerson
> Cc: llvm-commits at cs.uiuc.edu for LLVM
> Subject: Re: [llvm-commits] [PATCH] ARM build attribute reading support
>
> On 15 November 2012 18:18, Amara Emerson <amara.emerson at arm.com> wrote:
>> Thanks for the comments. Yes, I realise the tests are a bit light at the
>> moment. I was hoping to get some general functionality comments as it
> would
>> probably take a few hundred lines to test each attribute type (and bloat
> the
>> llvm-readobj tool somewhat). If the general structure looks good I can add
>> more tests.
>
> Should be enough to have a few objects with different configurations
> for the attribute table on a test dir. And maybe one with a giant
> table with all possible arguments in it.
>
>
>>> Does big.LITTLE have its own value, here?
>>
>> No this is for the current ARM ELF ABI spec, which has a defined list of
>> architecture types.
>
> I know, but since big.LITTLE will certainly need new thoughts on how
> to optimise and especially how to select libraries (probably something
> like gnu's ifunc etc), would be good to have it in, if ARM already
> have something public towards that.
>
>
>> I don't have much of a preference really. The header file was an extended
>> version of the existing one in the ARM backend so I continued with the
> enum
>> style already used.
>
> I know, much of it was in the old file.
>
>
>> I used getters/setters because I wasn't sure on how to do error handling
> if
>> an invalid or strange attribute value was read. libObject seems different
> to
>> the rest of LLVM in this sense. E.g. should reading an invalid/malformed
>> attribute value result in an fatal error? If not, how should it warn or be
>> handled silently. If anyone who knows it better than me can give tips here
>> it'd be welcome.
>
> My personal feeling is that a malformed attribute table should be
> treated as an error, and you should not have setters in the class.
>
> So, instead of transforming that into a POD, I'd leave it all private,
> implement only the getters and let only that function create it
> (either by marking its class as friend or by having a way to construct
> it with all the information at once - maybe a factory).
>
> The point is that we may want to merge different tables, but we don't
> want to silently change any of the original tables, and each one must
> reflect exactly what was in the object file. The factory mentioned
> above could also hold the merging rules (when constructing from an
> array of tables, for instance), if that proves necessary in the
> future.
>
> --
> cheers,
> --renato
>
> http://systemcall.org/

The goal of error handling in LLVMObject is to allow as much laziness
as possible while not crashing on any invalid files. The ELF reader
currently doesn't do this very well, however the COFF reader has quite
a few examples of how this is handled. I admit that it's quite verbose
but the alternative is either silently dropping data or crashing on
parse errors :( (If someone has a better solution I'd love to see it)

This patch really should be written to correctly handle parsing errors.

- Michael Spencer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-buildattrs-r3.patch
Type: application/octet-stream
Size: 36320 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121120/6c127163/attachment.obj>


More information about the llvm-commits mailing list