[llvm-commits] [llvm] r140345 - in /llvm/trunk: include/llvm/MC/MCAtom.h include/llvm/MC/MCModule.h lib/MC/MCAtom.cpp lib/MC/MCModule.cpp

Owen Anderson resistor at mac.com
Fri Sep 23 00:54:25 PDT 2011


Hi James,

This isn't actually intended as a step towards a linker format, though it does draw some inspiration from it.  The intention is to provide an API for clients to access rich, whole-program disassemblies.  You may have noticed Benjamin's work recently on CFG rediscovery at the MC level, as well as DWARF decoding for annotated disassembly.  This is related.

The term atom actually comes from the MachO linker, where it refers to relocatable regions.  Here, I use it to refer to contiguous regions that are uniformly instructions, or uniformly data.  Note that this does not correspond to sections/segments.  It is common in Thumb1 code, for instance, to have constant pools embedded within executable code because of the limited displacements available.  With Benjamin's CFG rediscovery, we will be able to distinguish those data atoms from the surrounding text atoms.

That said, it may make sense for atoms to be tagged with section/segment information.  I haven't really thought about that yet.

--Owen

On Sep 23, 2011, at 12:09 AM, James Molloy wrote:

> Hi Owen,
> 
> This looks nice. Is it meant to be a stepping-stone to linker formats such as ELF and Mach-O?
> 
> If so (or even if not), why did you decide to create a new piece of nomenclature ("Atom") where words like "segment" or "section" already exist and describe the same concept?
> 
> Cheers,
> 
> James
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Owen Anderson
> Sent: 22 September 2011 23:32
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm-commits] [llvm] r140345 - in /llvm/trunk: include/llvm/MC/MCAtom.h include/llvm/MC/MCModule.h lib/MC/MCAtom.cpp lib/MC/MCModule.cpp
> 
> Author: resistor
> Date: Thu Sep 22 17:32:22 2011
> New Revision: 140345
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=140345&view=rev
> Log:
> Start stubbing out MCModule and MCAtom, which provide an API for accessing the rich disassembly of a complete object or executable.
> These are very much a work in progress, and not really useful yet.
> 
> Added:
>    llvm/trunk/include/llvm/MC/MCAtom.h
>    llvm/trunk/include/llvm/MC/MCModule.h
>    llvm/trunk/lib/MC/MCAtom.cpp
>    llvm/trunk/lib/MC/MCModule.cpp
> 
> Added: llvm/trunk/include/llvm/MC/MCAtom.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MC/MCAtom.h?rev=140345&view=auto
> ==============================================================================
> --- llvm/trunk/include/llvm/MC/MCAtom.h (added)
> +++ llvm/trunk/include/llvm/MC/MCAtom.h Thu Sep 22 17:32:22 2011
> @@ -0,0 +1,75 @@
> +//===-- llvm/MC/MCAtom.h - MCAtom class ---------------------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file contains the declaration of the MCAtom class, which is used to
> +// represent a contiguous region in a decoded object that is uniformly data or
> +// instructions;
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_MC_MCATOM_H
> +#define LLVM_MC_MCATOM_H
> +
> +#include "llvm/MC/MCInst.h"
> +#include "llvm/Support/DataTypes.h"
> +#include <vector>
> +
> +namespace llvm {
> +
> +class MCModule;
> +
> +/// MCData - An entry in a data MCAtom.
> +// NOTE: This may change to a more complex type in the future.
> +typedef uint8_t MCData;
> +
> +/// MCAtom - Represents a contiguous range of either instructions (a TextAtom)
> +/// or data (a DataAtom).  Address ranges are expressed as _closed_ intervals.
> +class MCAtom {
> +  friend class MCModule;
> +  typedef enum { TextAtom, DataAtom } AtomType;
> +
> +  AtomType Type;
> +  MCModule *Parent;
> +  uint64_t Begin, End;
> +
> +  std::vector<std::pair<uint64_t, MCInst> > Text;
> +  std::vector<MCData> Data;
> +
> +  // Private constructor - only callable by MCModule
> +  MCAtom(AtomType T, MCModule *P, uint64_t B, uint64_t E)
> +    : Type(T), Parent(P), Begin(B), End(E) { }
> +
> +public:
> +  bool isTextAtom() { return Type == TextAtom; }
> +  bool isDataAtom() { return Type == DataAtom; }
> +
> +  void addInst(const MCInst &I, uint64_t Address) {
> +    assert(Type == TextAtom && "Trying to add MCInst to a non-text atom!");
> +    Text.push_back(std::make_pair(Address, I));
> +  }
> +
> +  void addData(const MCData &D) {
> +    assert(Type == DataAtom && "Trying to add MCData to a non-data atom!");
> +    Data.push_back(D);
> +  }
> +
> +  /// split - Splits the atom in two at a given address, which must align with
> +  /// and instruction boundary if this is a TextAtom.  Returns the newly created
> +  /// atom representing the high part of the split.
> +  MCAtom *split(uint64_t SplitPt);
> +
> +  /// truncate - Truncates an atom so that TruncPt is the last byte address
> +  /// contained in the atom.
> +  void truncate(uint64_t TruncPt);
> +};
> +
> +}
> +
> +#endif
> +
> 
> Added: llvm/trunk/include/llvm/MC/MCModule.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MC/MCModule.h?rev=140345&view=auto
> ==============================================================================
> --- llvm/trunk/include/llvm/MC/MCModule.h (added)
> +++ llvm/trunk/include/llvm/MC/MCModule.h Thu Sep 22 17:32:22 2011
> @@ -0,0 +1,58 @@
> +//===-- llvm/MC/MCModule.h - MCModule class ---------------------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file contains the declaration of the MCModule class, which is used to
> +// represent a complete, disassembled object file or executable.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_MC_MCMODULE_H
> +#define LLVM_MC_MCMODULE_H
> +
> +#include "llvm/ADT/DenseMap.h"
> +#include "llvm/ADT/IntervalMap.h"
> +#include "llvm/ADT/SmallPtrSet.h"
> +#include "llvm/Support/DataTypes.h"
> +
> +namespace llvm {
> +
> +class MCAtom;
> +
> +/// MCModule - This class represent a completely disassembled object file or
> +/// executable.  It comprises a list of MCAtom's, and a branch target table.
> +/// Each atom represents a contiguous range of either instructions or data.
> +class MCModule {
> +  /// AtomAllocationTracker - An MCModule owns its component MCAtom's, so it
> +  /// must track them in order to ensure they are properly freed as atoms are
> +  /// merged or otherwise manipulated.
> +  SmallPtrSet<MCAtom*, 8> AtomAllocationTracker;
> +
> +  /// OffsetMap - Efficiently maps offset ranges to MCAtom's.
> +  IntervalMap<uint64_t, MCAtom*> OffsetMap;
> +
> +  /// BranchTargetMap - Maps offsets that are determined to be branches and
> +  /// can be statically resolved to their target offsets.
> +  DenseMap<uint64_t, MCAtom*> BranchTargetMap;
> +
> +  friend class MCAtom;
> +
> +  /// remap - Update the interval mapping for an MCAtom.
> +  void remap(MCAtom *Atom, uint64_t NewBegin, uint64_t NewEnd);
> +
> +public:
> +  MCModule(IntervalMap<uint64_t, MCAtom*>::Allocator &A) : OffsetMap(A) { }
> +
> +  /// createAtom - Creates a new MCAtom covering the specified offset range.
> +  MCAtom *createAtom(MCAtom::AtomType Type, uint64_t Begin, uint64_t End);
> +};
> +
> +}
> +
> +#endif
> +
> 
> Added: llvm/trunk/lib/MC/MCAtom.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MC/MCAtom.cpp?rev=140345&view=auto
> ==============================================================================
> --- llvm/trunk/lib/MC/MCAtom.cpp (added)
> +++ llvm/trunk/lib/MC/MCAtom.cpp Thu Sep 22 17:32:22 2011
> @@ -0,0 +1,79 @@
> +//===- lib/MC/MCAtom.cpp - MCAtom implementation --------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/MC/MCAtom.h"
> +#include "llvm/MC/MCModule.h"
> +#include "llvm/Support/ErrorHandling.h"
> +
> +using namespace llvm;
> +
> +MCAtom *MCAtom::split(uint64_t SplitPt) {
> +  assert((SplitPt > Begin && SplitPt <= End) &&
> +         "Splitting at point not contained in atom!");
> +
> +  // Compute the new begin/end points.
> +  uint64_t LeftBegin = Begin;
> +  uint64_t LeftEnd = SplitPt - 1;
> +  uint64_t RightBegin = SplitPt;
> +  uint64_t RightEnd = End;
> +
> +  // Remap this atom to become the lower of the two new ones.
> +  Parent->remap(this, LeftBegin, LeftEnd);
> +
> +  // Create a new atom for the higher atom.
> +  MCAtom *RightAtom = Parent->createAtom(Type, RightBegin, RightEnd);
> +
> +  // Split the contents of the original atom between it and the new one.  The
> +  // precise method depends on whether this is a data or a text atom.
> +  if (isDataAtom()) {
> +    std::vector<MCData>::iterator I = Data.begin() + (RightBegin - LeftBegin);
> +
> +    assert(I != Data.end() && "Split point not found in range!");
> +
> +    std::copy(I, Data.end(), RightAtom->Data.end());
> +    Data.erase(I, Data.end());
> +  } else if (isTextAtom()) {
> +    std::vector<std::pair<uint64_t, MCInst> >::iterator I = Text.begin();
> +
> +    while (I != Text.end() && I->first < SplitPt) ++I;
> +
> +    assert(I != Text.end() && "Split point not found in disassembly!");
> +    assert(I->first == SplitPt &&
> +           "Split point does not fall on instruction boundary!");
> +
> +    std::copy(I, Text.end(), RightAtom->Text.end());
> +    Text.erase(I, Text.end());
> +  } else
> +    llvm_unreachable("Unknown atom type!");
> +
> +  return RightAtom;
> +}
> +
> +void MCAtom::truncate(uint64_t TruncPt) {
> +  assert((TruncPt >= Begin && TruncPt < End) &&
> +         "Truncation point not contained in atom!");
> +
> +  Parent->remap(this, Begin, TruncPt);
> +
> +  if (isDataAtom()) {
> +    Data.resize(TruncPt - Begin + 1);
> +  } else if (isTextAtom()) {
> +    std::vector<std::pair<uint64_t, MCInst> >::iterator I = Text.begin();
> +
> +    while (I != Text.end() && I->first <= TruncPt) ++I;
> +
> +    assert(I != Text.end() && "Truncation point not found in disassembly!");
> +    assert(I->first == TruncPt+1 &&
> +           "Truncation point does not fall on instruction boundary");
> +
> +    Text.erase(I, Text.end());
> +  } else
> +    llvm_unreachable("Unknown atom type!");
> +}
> +
> 
> Added: llvm/trunk/lib/MC/MCModule.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MC/MCModule.cpp?rev=140345&view=auto
> ==============================================================================
> --- llvm/trunk/lib/MC/MCModule.cpp (added)
> +++ llvm/trunk/lib/MC/MCModule.cpp Thu Sep 22 17:32:22 2011
> @@ -0,0 +1,45 @@
> +//===- lib/MC/MCModule.cpp - MCModule implementation --------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/MC/MCAtom.h"
> +#include "llvm/MC/MCModule.h"
> +
> +using namespace llvm;
> +
> +MCAtom *MCModule::createAtom(MCAtom::AtomType Type,
> +                             uint64_t Begin, uint64_t End) {
> +  assert(Begin < End && "Creating MCAtom with endpoints reversed?");
> +
> +  // Check for atoms already covering this range.
> +  IntervalMap<uint64_t, MCAtom*>::iterator I = OffsetMap.find(Begin);
> +  assert((!I.valid() || I.start() < End) && "Offset range already occupied!");
> +
> +  // Create the new atom and add it to our maps.
> +  MCAtom *NewAtom = new MCAtom(Type, this, Begin, End);
> +  AtomAllocationTracker.insert(NewAtom);
> +  OffsetMap.insert(Begin, End, NewAtom);
> +  return NewAtom;
> +}
> +
> +// remap - Update the interval mapping for an atom.
> +void MCModule::remap(MCAtom *Atom, uint64_t NewBegin, uint64_t NewEnd) {
> +  // Find and erase the old mapping.
> +  IntervalMap<uint64_t, MCAtom*>::iterator I = OffsetMap.find(Atom->Begin);
> +  assert(I.valid() && "Atom offset not found in module!");
> +  assert(*I == Atom && "Previous atom mapping was invalid!");
> +  I.erase();
> +
> +  // Insert the new mapping.
> +  OffsetMap.insert(NewBegin, NewEnd, Atom);
> +
> +  // Update the atom internal bounds.
> +  Atom->Begin = NewBegin;
> +  Atom->End = NewEnd;
> +}
> +
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
> 




More information about the llvm-commits mailing list