[llvm-commits] [llvm] r47621 - in /llvm/trunk: include/llvm-c/lto.h tools/lto2/ tools/lto2/LTOCodeGenerator.cpp tools/lto2/LTOCodeGenerator.h tools/lto2/LTOModule.cpp tools/lto2/LTOModule.h tools/lto2/Makefile tools/lto2/lto.cpp tools/lto2/lto.exports

Nick Kledzik kledzik at apple.com
Wed Feb 27 14:22:23 PST 2008


On Feb 26, 2008, at 3:12 PM, Chris Lattner wrote:

> On Feb 26, 2008, at 12:26 PM, Nick Kledzik wrote:
>> URL: http://llvm.org/viewvc/llvm-project?rev=47621&view=rev
>> Log:
>> first commit of new LTO system.  It is not hooked up in the llvm/ 
>> tools/Makefile, so no one will build it be default yet
>
> Nice! Some details:
>
>> +/**
>> + * Returns the last error string or NULL if last operation was  
>> sucessful.
> typo "sucessful"
fixed

>> +
>> +/**
>> + * Returns the number of symbols in the object module.
>> + */
>> +extern unsigned int
>> +lto_module_get_num_symbols(lto_module_t mod);
>
> Minor details which you should feel to ignore: the "int" in  
> "unsigned int" is redundant, and the "extern" is also redundant.   
> You could choose to drop those if you care.  We tend to not use  
> them in the rest of llvm, but it doesn't really matter either way.
extern may be redundant on function prototypes, but it is very  
necessary in C headers on data declarations.  Otherwise they become  
tentative definitions.  So, I tend to explict and add extern  
everywhere.  I've always found a raw "unsigned" to be like a dangle  
preposition.  I'm keep thinking "unsigned what?".


>> ===================================================================== 
>> =========
>> --- llvm/trunk/tools/lto2/LTOCodeGenerator.cpp (added)
>> +++ llvm/trunk/tools/lto2/LTOCodeGenerator.cpp Tue Feb 26 14:26:43  
>> 2008
>
>>
>> +
>> +#include "llvm/Module.h"
>> +#include "llvm/PassManager.h"
>> +#include "llvm/Linker.h"
>> +#include "llvm/Constants.h"
>> +#include "llvm/DerivedTypes.h"
>> +#include "llvm/ModuleProvider.h"
>> +#include "llvm/Bitcode/ReaderWriter.h"
>> +#include "llvm/Support/CommandLine.h"
>> +#include "llvm/Support/FileUtilities.h"
>> +#include "llvm/Support/SystemUtils.h"
>> +#include "llvm/Support/Mangler.h"
>> +#include "llvm/Support/MemoryBuffer.h"
>> +#include "llvm/System/Program.h"
>> +#include "llvm/System/Signals.h"
>> +#include "llvm/Analysis/Passes.h"
>> +#include "llvm/Analysis/LoopPass.h"
>> +#include "llvm/Analysis/Verifier.h"
>> +#include "llvm/CodeGen/FileWriters.h"
>> +#include "llvm/Target/SubtargetFeature.h"
>> +#include "llvm/Target/TargetOptions.h"
>> +#include "llvm/Target/TargetData.h"
>> +#include "llvm/Target/TargetMachine.h"
>> +#include "llvm/Target/TargetMachineRegistry.h"
>> +#include "llvm/Target/TargetAsmInfo.h"
>> +#include "llvm/Transforms/IPO.h"
>> +#include "llvm/Transforms/Scalar.h"
>> +#include "llvm/Analysis/LoadValueNumbering.h"
>> +#include "llvm/Support/MathExtras.h"
>> +#include "llvm/Config/config.h"
>> +
>> +#include "LTOModule.h"
>> +#include "LTOCodeGenerator.h"
>
> You should #include LTOCodeGenerator.h first in the list to ensure  
> that it is self contained.  Also, please remove any #includes that  
> aren't needed.  There is some justification and more info for this  
> here:
> http://llvm.org/docs/CodingStandards.html#scf_includes
fixed


>> +
>> +#include <fstream>
>> +#include <unistd.h>
>> +#include <stdlib.h>
>> +#include <fcntl.h>
>
> For standard c++ headers like stdlib, please use <cstdlib>.  Is it  
> possible to avoid unistd.h/fcntl.h with the llvm/System/Path.h or  
> MemoryBuffer.h stuff?  MemoryBuffer.h is an efficient way to read  
> an entire file in off the disk.
The issue is that lto_codegen_compile() returns a malloc'ed block  
which the client frees when done.  I don't see a way to use  
MemoryBuffer that allows me to transfer ownership of the underlying  
buffer away and then delete the buffer.  Also if looks like  
MemoryBuffer::getFile() mmaps the file instead of using malloc() so,  
the client then can't free() it.   What I've done is change the  
commensts for to_codegen_compile() to say that the returned buffer is  
owned by the lto_codegen_t object and is freed with it is disposed.   
Then I got rid of the malloc/open/read and used MemoryBuffer::getFile().


>
>> +void* LTOCodeGenerator::compile(size_t* length, std::string& errMsg)
>> +{
>> +    // make unqiue temp .s file to put generated assembly code
>
> typo "unqiue", also comments should end with proper punctuation and  
> use capital letters :)
Fixed

>
>>
>> +
>> +    if ( !asmResult ) {
>> +        // read .o file into memory buffer
>> +        const sys::FileStatus* objStatus;
>> +        objStatus = uniqueObjPath.getFileStatus(false, &errMsg);
>> +        if ( objStatus != NULL ) {
>> +            *length = objStatus->getSize();
>> +            // use malloc() because caller will own this buffer  
>> and free() it
>> +            buffer = ::malloc(*length);
>> +            if ( buffer != NULL ) {
>> +                int fd = ::open(uniqueObjPath.c_str(), O_RDONLY, 0);
>> +                if ( fd != -1 ) {
>> +                    // read object file contents into buffer
>> +                    if ( ::read(fd, buffer, *length) != (ssize_t) 
>> *length ) {
>> +                        errMsg = "error reading object file";
>> +                        free(buffer);
>> +                        buffer = NULL;
>
> I don't think this is correct: read can return partial results,  
> however it's best to use MemoryBuffer to avoid having to do this  
> explicitly.
>
>>
>> +                    }
>> +                    close(fd);
>> +                }
>> +                else {
>> +                    errMsg = "error opening object file";
>> +                    free(buffer);
>> +                    buffer = NULL;
>> +                }
>> +            }
>> +            else {
>> +                errMsg = "error mallocing space for object file";
>> +            }
>> +        }
>> +        else {
>> +            errMsg = "error stat'ing object file";
>> +        }
>
> I think the code will end up getting simpler with MemoryBuffer, but  
> if that we're going to happen, I'd suggest using early exits to  
> avoid having the code be so nested.  This would give you:
>
>   if (fd == -1) {
>     errMsg = "error opening object file";
>     return NULL;
>   }
>
> for example, but this also requires using RAII to delete the files  
> etc.  If you use MemoryBuffer, the code will likely be simple  
> enough that it's not worth doing this.
Fixed.  Now use MemoryBuffer.


>
>> ===================================================================== 
>> =========
>> --- llvm/trunk/tools/lto2/LTOCodeGenerator.h (added)
>> +++ llvm/trunk/tools/lto2/LTOCodeGenerator.h Tue Feb 26 14:26:43 2008
>
>
>> +    bool                writeMergedModules(const char* path,  
>> std::string& errMsg);
>
> This goes over 80 columns.
Fixed (I just threw that one in to see if the rumors were true, and  
the are ;-)


>> ===================================================================== 
>> =========
>> --- llvm/trunk/tools/lto2/LTOModule.cpp (added)
>> +++ llvm/trunk/tools/lto2/LTOModule.cpp Tue Feb 26 14:26:43 2008
>> @@ -0,0 +1,329 @@
>> +//===-LTOModule.cpp - LLVM Link Time Optimizer  
>> ----------------------------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open  
>> Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +//
>> +// This file implements the Link Time Optimization library. This  
>> library is
>> +// intended to be used by linker to optimize code at link time.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +
>> +#include "llvm/Module.h"
>> +#include "llvm/PassManager.h"
>> +#include "llvm/Linker.h"
>> +#include "llvm/Constants.h"
>> +#include "llvm/DerivedTypes.h"
>> +#include "llvm/ModuleProvider.h"
>> +#include "llvm/Bitcode/ReaderWriter.h"
>> +#include "llvm/Support/CommandLine.h"
>> +#include "llvm/Support/FileUtilities.h"
>> +#include "llvm/Support/SystemUtils.h"
>> +#include "llvm/Support/Mangler.h"
>> +#include "llvm/Support/MemoryBuffer.h"
>> +#include "llvm/System/Program.h"
>> +#include "llvm/System/Path.h"
>> +#include "llvm/System/Signals.h"
>> +#include "llvm/Target/SubtargetFeature.h"
>> +#include "llvm/Target/TargetOptions.h"
>> +#include "llvm/Target/TargetData.h"
>> +#include "llvm/Target/TargetMachine.h"
>> +#include "llvm/Target/TargetMachineRegistry.h"
>> +#include "llvm/Target/TargetAsmInfo.h"
>> +#include "llvm/Transforms/IPO.h"
>> +#include "llvm/Transforms/Scalar.h"
>> +#include "llvm/Analysis/LoadValueNumbering.h"
>> +#include "llvm/Support/MathExtras.h"
>> +
>> +#include "LTOModule.h"
>
> As before, pruning this down would be good if possible.  Please  
> move LTOModule.h to the top of the list.
Fixed

>
>> +bool LTOModule::isBitcodeFileForTarget(const void* mem,
>> +                                    size_t length, const char*  
>> triplePrefix)
>> +{
>> +    bool result = false;
>> +    MemoryBuffer* buffer;
>> +    buffer = MemoryBuffer::getMemBuffer((char*)mem, (char*)mem 
>> +length);
>> +    if ( buffer != NULL ) {
>> +        ModuleProvider* mp = getBitcodeModuleProvider(buffer);
>> +        if ( mp != NULL ) {
>> +            std::string actualTarget = mp->getModule()- 
>> >getTargetTriple();
>> +            if  ( strncmp(actualTarget.c_str(), triplePrefix,
>> +                                    strlen(triplePrefix)) == 0) {
>> +                result = true;
>> +            }
>> +            //  mp destructor will delete buffer
>> +            delete mp;
>> +        }
>> +        else {
>> +            // if getBitcodeModuleProvider failed, we need to  
>> delete buffer
>> +            delete buffer;
>> +        }
>> +    }
>> +    return result;
>> +}
>
> Please use RAII and early exits to simplify the code.  A useful  
> thing for this is llvm/ADT/OwningPtr.h.  You should be able to do  
> something like this:
>
>   OwningPtr<MemoryBuffer> buffer = MemoryBuffer::getMemBuffer...
>   if (buffer == 0) return 0;
>
>   OwningPtr<ModuleProvider> mp = getBitcodeModuleProvider(buffer);
>   if (mp == 0) return 0;
>
>   ...
>
> This makes all the error cases be early exits and makes the success  
> case easier to understand.  Also, no nesting!  There are other  
> methods that can use similar treatment.
Hey!  OwningPtr<> is not in the Xcode project.  That is how I've been  
search llvm sources.

I reworked this to use OwningPtr<> and factored out the target  
matching code, and yes it is all much smaller now.  Cool!


>
>> +LTOModule::~LTOModule()
>> +{
>> +    delete _module;
>> +    if ( _target != NULL )
>> +        delete _target;
>> +}
>
> delete accepts a null pointer, so you can just make the second  
> delete unconditional.  It might also make sense to make the  
> "_module" and "_target" members of LTOModule be OwningPointers,  
> which would mean you don't have to write the dtor manually.
I made them OwningPtrs.


>
>> +LTOModule* LTOModule::makeLTOModule(const char* path,  
>> std::string& errMsg)
>> +{
>> +LTOModule* LTOModule::makeLTOModule(const void* mem, size_t length,
>> +                                                         
>> std::string& errMsg)
>> +{
>
> These two methods share all the target matching code.  Please pull  
> it out into a shared helper method (that takes a MemoryBuffer) or  
> something.
Fixed.


>
>> +void LTOModule::addDefinedSymbol(GlobalValue* def, Mangler &mangler,
>> +                                bool isFunction)
>> +{
>> +    const char* symbolName = ::strdup(mangler.getValueName 
>> (def).c_str());
>> +
>> +    // set alignment part log2() can have rounding errors
>> +    uint32_t align = def->getAlignment();
>> +    uint32_t attr = align ? __builtin_ctz(def->getAlignment()) : 0;
>
> Please don't use __builtin_ctz(), use a helper from llvm/Support/ 
> MathExtras.h.
Fixed

>
>> +    // set definition part
>> +    if ( def->hasWeakLinkage() || def->hasLinkOnceLinkage() ) {
>> +        // lvm bitcode does not differenciate between weak def data
>> +        // and tentative definitions!
>> +        // HACK HACK HACK
>> +        // C++ does not use tentative definitions, but does use  
>> weak symbols
>> +        // so guess that anything that looks like a C++ symbol is  
>> weak and others
>> +        // are tentative definitions
>
> Wow, that is a hack.  :)  We should really fix this at the LLVM IR  
> level, can you please file a radar that demonstrates how not  
> distinguishing between these two cases cause a problem?
We already have <rdar://problem/5684011> that Dale wrote up.


>
>> +uint32_t LTOModule::getSymbolCount()
>> +{
>> +    if ( !_symbolsParsed ) {
>> +        _symbolsParsed = true;
>> +
>> +        // Use mangler to add GlobalPrefix to names to match  
>> linker names.
>> +        Mangler mangler(*_module, _target->getTargetAsmInfo()- 
>> >getGlobalPrefix());
>
> There is one other aspect of configuration for the mangler: On  
> darwin at least you have to call manger.setUseQuotes(true);
>
> This tells it that it is ok to make some symbols use double quotes  
> (e.g. those with a space in them like objc methods) instead of  
> mangling the space.  I don't know if we can currently get this from  
> TargetAsmInfo.h, but I think we should be able to.  Dale or Evan  
> can help with this part.
I don't think this is applicable at this level.  You need double  
quotes around ObjC method names in the generated assembly source file  
because the otherwise the assembler can't parse the label name.  The  
linker and LTO use raw const char* strings which can have embedded  
spaces.


>>
>> +
>> +        // add functions
>> +        for (Module::iterator f = _module->begin(); f != _module- 
>> >end(); ++f) {
>> +            if ( f->isDeclaration() ) {
>> +                addUndefinedSymbol(mangler.getValueName(f).c_str());
>> +            }
>> +            else {
>
> To reduce nesting, I'd suggest writing this as:
>
>> +        for (Module::iterator f = _module->begin(); f != _module- 
>> >end(); ++f) {
>> +            if ( f->isDeclaration() ) {
>> +                addUndefinedSymbol(mangler.getValueName(f).c_str());
>                    continue;
>>
>> +            }
>
> which unnests the 'else' part.
>
>
>>
>> +                addDefinedSymbol(f, mangler, true);
>> +                // add external symbols referenced by this function.
>> +                for (Function::iterator b = f->begin(); b != f- 
>> >end(); ++b) {
>> +                    for (BasicBlock::iterator i = b->begin();
>> +                                                        i != b- 
>> >end(); ++i) {
>> +                        for (unsigned count = 0, total = i- 
>> >getNumOperands();
>> +                                                    count !=  
>> total; ++count) {
>> +                            findExternalRefs(i->getOperand 
>> (count), mangler);
>> +                        }
>> +                    }
>> +                }
>
> I'd suggest splitting this triply nested loop out into its own  
> helper function.
>
>>
>> +            }
>> +        }
>> +
>> +        // add data
>> +        for (Module::global_iterator v = _module->global_begin(),
>> +                                    e = _module->global_end(); v ! 
>> =  e; ++v) {
>> +            if ( v->isDeclaration() ) {
>> +                addUndefinedSymbol(mangler.getValueName(v).c_str());
>> +            }
>
> This can also use continue to reduce nesting.
I pulled the nested for loops out into their own method, and this  
collapsed the function
down to where we don't need the continue statements anymore.




>
>>
>> +        // make symbols for all undefines
>> +        for (StringSet::iterator it=_undefines.begin();
>> +                                                it !=  
>> _undefines.end(); ++it) {
>> +            // if this symbol also has a definition, then don't  
>> make an undefine
>> +            // because it is a tentative definition
>> +            if ( _defines.find(it->getKeyData(), it->getKeyData() 
>> +it->getKeyLength()) == _defines.end() ) {
>
> likewise: if (_defines.find(..) != _defines.end()) continue;
Changed to use count() which shorts the line.

>
>>
>> +++ llvm/trunk/tools/lto2/LTOModule.h Tue Feb 26 14:26:43 2008
>> @@ -0,0 +1,83 @@
>> +//===-LTOModule.h - LLVM Link Time Optimizer  
>> ------------------------------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open  
>> Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +//
>> +// This file declares the LTOModule class.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +
>> +#ifndef LTO_MODULE_H
>> +#define LTO_MODULE_H
>> +
>> +#include "llvm/Module.h"
>> +#include "llvm/GlobalValue.h"
>> +#include "llvm/Constants.h"
>> +#include "llvm/Support/Mangler.h"
>> +#include "llvm/Target/TargetMachine.h"
>> +#include "llvm/ADT/StringMap.h"
>
> I think that several of these #includes can be removed.   
> GlobalValue.h is redundant, and I think you can forward declare  
> most of the classes instead of #including their header.
I removed some, but then added some back because of new private  
methods that used those types as parameters.

>
>> +    bool                             _symbolsParsed;
>> +    std::vector<NameAndAttributes>   _symbols;
>> +    StringSet                        _defines;    // only needed  
>> to disambiguate tentative definitions
>> +    StringSet                        _undefines;  // only needed  
>> to disambiguate tentative definitions
>
> 80 cols.
Fixed.


>
>> +//===-lto.cpp - LLVM Link Time Optimizer  
>> ----------------------------------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open  
>> Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +//
>> +// This file implements the Link Time Optimization library. This  
>> library is
>> +// intended to be used by linker to optimize code at link time.
>> +//
>> +// 
>> ===------------------------------------------------------------------ 
>> ----===//
>> +
>> +#include "llvm-c/lto.h"
>> +
>> +#include "LTOModule.h"
>> +#include "LTOCodeGenerator.h"
>> +
>> +
>> +// holds most recent error string
>> +// *** not thread safe ***
>> +static std::string sLastErrorString;
>
> This also has a static ctor, which we try to avoid when possible  
> (yes, we don't avoid very well, but we still try :).  Is there  
> another way to do this?
Even without it, there is still a static ctor in lto.o.  Something to  
do with ManglerLinkVar and IncludeFile().
Given that there are a couple hundred ctors in the libLTO.dylib, I'm  
not to worried about this one.

-Nick





More information about the llvm-commits mailing list