[llvm-commits] [llvm] r47621 - in /llvm/trunk: include/llvm-c/lto.h tools/lto2/ tools/lto2/LTOCodeGenerator.cpp tools/lto2/LTOCodeGenerator.h tools/lto2/LTOModule.cpp tools/lto2/LTOModule.h tools/lto2/Makefile tools/lto2/lto.cpp tools/lto2/lto.exports

Chris Lattner clattner at apple.com
Tue Feb 26 15:12:16 PST 2008


On Feb 26, 2008, at 12:26 PM, Nick Kledzik wrote:
> URL: http://llvm.org/viewvc/llvm-project?rev=47621&view=rev
> Log:
> first commit of new LTO system.  It is not hooked up in the llvm/ 
> tools/Makefile, so no one will build it be default yet

Nice! Some details:

> +/**
> + * Returns the last error string or NULL if last operation was  
> sucessful.

typo "sucessful"

>
> +
> +/**
> + * Returns the number of symbols in the object module.
> + */
> +extern unsigned int
> +lto_module_get_num_symbols(lto_module_t mod);

Minor details which you should feel to ignore: the "int" in "unsigned  
int" is redundant, and the "extern" is also redundant.  You could  
choose to drop those if you care.  We tend to not use them in the rest  
of llvm, but it doesn't really matter either way.

> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> --- llvm/trunk/tools/lto2/LTOCodeGenerator.cpp (added)
> +++ llvm/trunk/tools/lto2/LTOCodeGenerator.cpp Tue Feb 26 14:26:43  
> 2008

>
> +
> +#include "llvm/Module.h"
> +#include "llvm/PassManager.h"
> +#include "llvm/Linker.h"
> +#include "llvm/Constants.h"
> +#include "llvm/DerivedTypes.h"
> +#include "llvm/ModuleProvider.h"
> +#include "llvm/Bitcode/ReaderWriter.h"
> +#include "llvm/Support/CommandLine.h"
> +#include "llvm/Support/FileUtilities.h"
> +#include "llvm/Support/SystemUtils.h"
> +#include "llvm/Support/Mangler.h"
> +#include "llvm/Support/MemoryBuffer.h"
> +#include "llvm/System/Program.h"
> +#include "llvm/System/Signals.h"
> +#include "llvm/Analysis/Passes.h"
> +#include "llvm/Analysis/LoopPass.h"
> +#include "llvm/Analysis/Verifier.h"
> +#include "llvm/CodeGen/FileWriters.h"
> +#include "llvm/Target/SubtargetFeature.h"
> +#include "llvm/Target/TargetOptions.h"
> +#include "llvm/Target/TargetData.h"
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/Target/TargetMachineRegistry.h"
> +#include "llvm/Target/TargetAsmInfo.h"
> +#include "llvm/Transforms/IPO.h"
> +#include "llvm/Transforms/Scalar.h"
> +#include "llvm/Analysis/LoadValueNumbering.h"
> +#include "llvm/Support/MathExtras.h"
> +#include "llvm/Config/config.h"
> +
> +#include "LTOModule.h"
> +#include "LTOCodeGenerator.h"

You should #include LTOCodeGenerator.h first in the list to ensure  
that it is self contained.  Also, please remove any #includes that  
aren't needed.  There is some justification and more info for this here:
http://llvm.org/docs/CodingStandards.html#scf_includes


>
> +
> +#include <fstream>
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <fcntl.h>

For standard c++ headers like stdlib, please use <cstdlib>.  Is it  
possible to avoid unistd.h/fcntl.h with the llvm/System/Path.h or  
MemoryBuffer.h stuff?  MemoryBuffer.h is an efficient way to read an  
entire file in off the disk.

> +void* LTOCodeGenerator::compile(size_t* length, std::string& errMsg)
> +{
> +    // make unqiue temp .s file to put generated assembly code

typo "unqiue", also comments should end with proper punctuation and  
use capital letters :)

>
> +
> +    if ( !asmResult ) {
> +        // read .o file into memory buffer
> +        const sys::FileStatus* objStatus;
> +        objStatus = uniqueObjPath.getFileStatus(false, &errMsg);
> +        if ( objStatus != NULL ) {
> +            *length = objStatus->getSize();
> +            // use malloc() because caller will own this buffer and  
> free() it
> +            buffer = ::malloc(*length);
> +            if ( buffer != NULL ) {
> +                int fd = ::open(uniqueObjPath.c_str(), O_RDONLY, 0);
> +                if ( fd != -1 ) {
> +                    // read object file contents into buffer
> +                    if ( ::read(fd, buffer, *length) !=  
> (ssize_t)*length ) {
> +                        errMsg = "error reading object file";
> +                        free(buffer);
> +                        buffer = NULL;

I don't think this is correct: read can return partial results,  
however it's best to use MemoryBuffer to avoid having to do this  
explicitly.

>
> +                    }
> +                    close(fd);
> +                }
> +                else {
> +                    errMsg = "error opening object file";
> +                    free(buffer);
> +                    buffer = NULL;
> +                }
> +            }
> +            else {
> +                errMsg = "error mallocing space for object file";
> +            }
> +        }
> +        else {
> +            errMsg = "error stat'ing object file";
> +        }

I think the code will end up getting simpler with MemoryBuffer, but if  
that we're going to happen, I'd suggest using early exits to avoid  
having the code be so nested.  This would give you:

   if (fd == -1) {
     errMsg = "error opening object file";
     return NULL;
   }

for example, but this also requires using RAII to delete the files  
etc.  If you use MemoryBuffer, the code will likely be simple enough  
that it's not worth doing this.

> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> --- llvm/trunk/tools/lto2/LTOCodeGenerator.h (added)
> +++ llvm/trunk/tools/lto2/LTOCodeGenerator.h Tue Feb 26 14:26:43 2008


> +    bool                writeMergedModules(const char* path,  
> std::string& errMsg);

This goes over 80 columns.


> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> --- llvm/trunk/tools/lto2/LTOModule.cpp (added)
> +++ llvm/trunk/tools/lto2/LTOModule.cpp Tue Feb 26 14:26:43 2008
> @@ -0,0 +1,329 @@
> +//===-LTOModule.cpp - LLVM Link Time Optimizer  
> ----------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open  
> Source
> +// License. See LICENSE.TXT for details.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +//
> +// This file implements the Link Time Optimization library. This  
> library is
> +// intended to be used by linker to optimize code at link time.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +
> +#include "llvm/Module.h"
> +#include "llvm/PassManager.h"
> +#include "llvm/Linker.h"
> +#include "llvm/Constants.h"
> +#include "llvm/DerivedTypes.h"
> +#include "llvm/ModuleProvider.h"
> +#include "llvm/Bitcode/ReaderWriter.h"
> +#include "llvm/Support/CommandLine.h"
> +#include "llvm/Support/FileUtilities.h"
> +#include "llvm/Support/SystemUtils.h"
> +#include "llvm/Support/Mangler.h"
> +#include "llvm/Support/MemoryBuffer.h"
> +#include "llvm/System/Program.h"
> +#include "llvm/System/Path.h"
> +#include "llvm/System/Signals.h"
> +#include "llvm/Target/SubtargetFeature.h"
> +#include "llvm/Target/TargetOptions.h"
> +#include "llvm/Target/TargetData.h"
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/Target/TargetMachineRegistry.h"
> +#include "llvm/Target/TargetAsmInfo.h"
> +#include "llvm/Transforms/IPO.h"
> +#include "llvm/Transforms/Scalar.h"
> +#include "llvm/Analysis/LoadValueNumbering.h"
> +#include "llvm/Support/MathExtras.h"
> +
> +#include "LTOModule.h"

As before, pruning this down would be good if possible.  Please move  
LTOModule.h to the top of the list.

> +bool LTOModule::isBitcodeFileForTarget(const void* mem,
> +                                    size_t length, const char*  
> triplePrefix)
> +{
> +    bool result = false;
> +    MemoryBuffer* buffer;
> +    buffer = MemoryBuffer::getMemBuffer((char*)mem, (char*)mem 
> +length);
> +    if ( buffer != NULL ) {
> +        ModuleProvider* mp = getBitcodeModuleProvider(buffer);
> +        if ( mp != NULL ) {
> +            std::string actualTarget = mp->getModule()- 
> >getTargetTriple();
> +            if  ( strncmp(actualTarget.c_str(), triplePrefix,
> +                                    strlen(triplePrefix)) == 0) {
> +                result = true;
> +            }
> +            //  mp destructor will delete buffer
> +            delete mp;
> +        }
> +        else {
> +            // if getBitcodeModuleProvider failed, we need to  
> delete buffer
> +            delete buffer;
> +        }
> +    }
> +    return result;
> +}

Please use RAII and early exits to simplify the code.  A useful thing  
for this is llvm/ADT/OwningPtr.h.  You should be able to do something  
like this:

   OwningPtr<MemoryBuffer> buffer = MemoryBuffer::getMemBuffer...
   if (buffer == 0) return 0;

   OwningPtr<ModuleProvider> mp = getBitcodeModuleProvider(buffer);
   if (mp == 0) return 0;

   ...

This makes all the error cases be early exits and makes the success  
case easier to understand.  Also, no nesting!  There are other methods  
that can use similar treatment.

> +LTOModule::~LTOModule()
> +{
> +    delete _module;
> +    if ( _target != NULL )
> +        delete _target;
> +}

delete accepts a null pointer, so you can just make the second delete  
unconditional.  It might also make sense to make the "_module" and  
"_target" members of LTOModule be OwningPointers, which would mean you  
don't have to write the dtor manually.

> +LTOModule* LTOModule::makeLTOModule(const char* path, std::string&  
> errMsg)
> +{
> +LTOModule* LTOModule::makeLTOModule(const void* mem, size_t length,
> +                                                         
> std::string& errMsg)
> +{

These two methods share all the target matching code.  Please pull it  
out into a shared helper method (that takes a MemoryBuffer) or  
something.

> +void LTOModule::addDefinedSymbol(GlobalValue* def, Mangler &mangler,
> +                                bool isFunction)
> +{
> +    const char* symbolName  
> = ::strdup(mangler.getValueName(def).c_str());
> +
> +    // set alignment part log2() can have rounding errors
> +    uint32_t align = def->getAlignment();
> +    uint32_t attr = align ? __builtin_ctz(def->getAlignment()) : 0;

Please don't use __builtin_ctz(), use a helper from llvm/Support/ 
MathExtras.h.

> +    // set definition part
> +    if ( def->hasWeakLinkage() || def->hasLinkOnceLinkage() ) {
> +        // lvm bitcode does not differenciate between weak def data
> +        // and tentative definitions!
> +        // HACK HACK HACK
> +        // C++ does not use tentative definitions, but does use  
> weak symbols
> +        // so guess that anything that looks like a C++ symbol is  
> weak and others
> +        // are tentative definitions

Wow, that is a hack.  :)  We should really fix this at the LLVM IR  
level, can you please file a radar that demonstrates how not  
distinguishing between these two cases cause a problem?

> +uint32_t LTOModule::getSymbolCount()
> +{
> +    if ( !_symbolsParsed ) {
> +        _symbolsParsed = true;
> +
> +        // Use mangler to add GlobalPrefix to names to match linker  
> names.
> +        Mangler mangler(*_module, _target->getTargetAsmInfo()- 
> >getGlobalPrefix());

There is one other aspect of configuration for the mangler: On darwin  
at least you have to call manger.setUseQuotes(true);

This tells it that it is ok to make some symbols use double quotes  
(e.g. those with a space in them like objc methods) instead of  
mangling the space.  I don't know if we can currently get this from  
TargetAsmInfo.h, but I think we should be able to.  Dale or Evan can  
help with this part.


>
> +
> +        // add functions
> +        for (Module::iterator f = _module->begin(); f != _module- 
> >end(); ++f) {
> +            if ( f->isDeclaration() ) {
> +                addUndefinedSymbol(mangler.getValueName(f).c_str());
> +            }
> +            else {

To reduce nesting, I'd suggest writing this as:

> +        for (Module::iterator f = _module->begin(); f != _module- 
> >end(); ++f) {
> +            if ( f->isDeclaration() ) {
> +                addUndefinedSymbol(mangler.getValueName(f).c_str());
                    continue;
>
> +            }

which unnests the 'else' part.


>
> +                addDefinedSymbol(f, mangler, true);
> +                // add external symbols referenced by this function.
> +                for (Function::iterator b = f->begin(); b != f- 
> >end(); ++b) {
> +                    for (BasicBlock::iterator i = b->begin();
> +                                                        i != b- 
> >end(); ++i) {
> +                        for (unsigned count = 0, total = i- 
> >getNumOperands();
> +                                                    count != total;  
> ++count) {
> +                            findExternalRefs(i->getOperand(count),  
> mangler);
> +                        }
> +                    }
> +                }

I'd suggest splitting this triply nested loop out into its own helper  
function.

>
> +            }
> +        }
> +
> +        // add data
> +        for (Module::global_iterator v = _module->global_begin(),
> +                                    e = _module->global_end(); v ! 
> =  e; ++v) {
> +            if ( v->isDeclaration() ) {
> +                addUndefinedSymbol(mangler.getValueName(v).c_str());
> +            }

This can also use continue to reduce nesting.

>
> +        // make symbols for all undefines
> +        for (StringSet::iterator it=_undefines.begin();
> +                                                it !=  
> _undefines.end(); ++it) {
> +            // if this symbol also has a definition, then don't  
> make an undefine
> +            // because it is a tentative definition
> +            if ( _defines.find(it->getKeyData(), it->getKeyData() 
> +it->getKeyLength()) == _defines.end() ) {

likewise: if (_defines.find(..) != _defines.end()) continue;

>
> +++ llvm/trunk/tools/lto2/LTOModule.h Tue Feb 26 14:26:43 2008
> @@ -0,0 +1,83 @@
> +//===-LTOModule.h - LLVM Link Time Optimizer  
> ------------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open  
> Source
> +// License. See LICENSE.TXT for details.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +//
> +// This file declares the LTOModule class.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +
> +#ifndef LTO_MODULE_H
> +#define LTO_MODULE_H
> +
> +#include "llvm/Module.h"
> +#include "llvm/GlobalValue.h"
> +#include "llvm/Constants.h"
> +#include "llvm/Support/Mangler.h"
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/ADT/StringMap.h"

I think that several of these #includes can be removed.  GlobalValue.h  
is redundant, and I think you can forward declare most of the classes  
instead of #including their header.

> +    bool                             _symbolsParsed;
> +    std::vector<NameAndAttributes>   _symbols;
> +    StringSet                        _defines;    // only needed to  
> disambiguate tentative definitions
> +    StringSet                        _undefines;  // only needed to  
> disambiguate tentative definitions

80 cols.

> +//===-lto.cpp - LLVM Link Time Optimizer  
> ----------------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open  
> Source
> +// License. See LICENSE.TXT for details.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +//
> +// This file implements the Link Time Optimization library. This  
> library is
> +// intended to be used by linker to optimize code at link time.
> +//
> +// 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> +
> +#include "llvm-c/lto.h"
> +
> +#include "LTOModule.h"
> +#include "LTOCodeGenerator.h"
> +
> +
> +// holds most recent error string
> +// *** not thread safe ***
> +static std::string sLastErrorString;

This also has a static ctor, which we try to avoid when possible (yes,  
we don't avoid very well, but we still try :).  Is there another way  
to do this?

Thanks Nick!

-Chris




More information about the llvm-commits mailing list