[fwd] Re: [LLVMdev] Hash Bang

Misha Brukman brukman at cs.uiuc.edu
Sat Oct 1 12:43:09 PDT 2005


Karl, I think you meant to cc the llvmdev list on this.

Thank you for a more detailed explanation, it's much clearer to me now.

I agree that making the execution of .bc files more transparent would
make it more useable as a stand-alone binary format on Unix-like systems
and adding programmable support to changing the #! line would prevent
much of user error involved in modifying the run line.

One issue is that the limit of 256 chars for the run line might not be
enough due to libraries.  For example, let's pretend we have a program
foo.c which uses several libraries and we compile it as follows:

% llvm-gcc foo.c -o foo

This will produce the bytecode file 'foo.bc' and a shell script 'foo'
which you can use to run the program via the JIT (lli).  The way LLI
loads external libraries is via the -load=[full path to library] flags,
and theoretically, there isn't a limit as to how many libraries a
program might use, so a hard-coded limit in the run line would certainly
be problematic.

The other issue is that it seems such libraries aren't specified in the
"deplibs" part of the Module, which perhaps it should be -- I just tried
with a simple example which uses sin() and the bytecode file did not
have a dependence on the math library, but the shell script had a
-load=/usr/lib/libm.so correctly added.  If the deplibs field were
updated correctly, perhaps LLI could automatically search the standard
system paths for such libraries.

Anyone else have any thoughts on this?

----- Forwarded message from Karl Magdsick <kmagnum at gmail.com> -----

Date: Sat, 1 Oct 2005 07:33:06 -0400
From: Karl Magdsick <kmagnum at gmail.com>
To: Misha Brukman <brukman at cs.uiuc.edu>
Subject: Re: [LLVMdev] Hash Bang
Reply-To: Karl Magdsick <kmagnum at gmail.com>

Thanks for the input.  It seems I should have explained my motivations.

Please forgive my reversing the order of your two points in my reply below:

> > This would allow llvm modules to be executable on UNIX systems (and
> > under cygwin).
>
> This is possible now, although in a more involved manner:
> http://llvm.cs.uiuc.edu/docs/GettingStarted.html#optionalconfig

My main impetus for this is that llvm binaries seem to me a much more elegant
and promising  technology  than Mach-O fat binaries (so-called Apple Universal
Binaries).  Of course, at the moment, fat binaries are a much more
proven technology.

Unfortunately, associating arbitrary magic numbers with interpreters on the
PowerBook that I use as my main computer involves rebooting into Ubuntu-PPC,
and means I lose the JIT.  OS X is the fastest growing UNIX flavor,
and it would
be nice if llvm binaries could be made to act as closely as possible to native
binaries on this platform.

Apple Apps are a very flexible solution from within Finder, but
"open /Applications/MyApp.app'" on the command line is no more elegant than
"lli /usr/local/bin/myApp".

I love misc binaries and compile misc binaries as a module when I compile
Linux.  As I remember, misc binaries aren't compiled by default under the Linus
branch.  Also, this doesn't work under cygwin, requires root access, and
equivalent functionality may require third-party kernel modifications under some
UNIX flavors.  (I'm not aware of anyone providing misc binary support in OSX,
for instance.)

On 9/30/05, Misha Brukman <brukman at cs.uiuc.edu> wrote:
> On Fri, Sep 30, 2005 at 09:50:45PM -0400, Karl Magdsick wrote:
> > Have you considered allowing a "hash bang path" to precede the llvm
> > magic number?
>
> Personally, I would find it weird to intermix text data with binary
> data.  Then, an incorrect invocation of an editor will corrupt the
> bytecode as the user saves the file.   As users aren't expected to edit
> the .bc files by hand, I'm not sure I would want to see them edit the #!
> line either.

Now that I think about it, it would be better if the only legal offsets for the
llvm magic number would be zero bytes and 259 bytes (#! + path + \n
+ null padding to constant length, where path is up to 256 bytes).

I don't think of it as "text data" as much as the standard way of attaching
interpreter metadata to executable files in a way that's portable across
UNIX flavors and filesystems.  I wish I knew of a more elegant and
standard way of doing this, but I'm not aware of any.

In theory, someone could use a text editor to edit this metadata, but
in theory someone could also use a text editor to edit function names
in llvm bytecode files.  The fact that a some short strings in a binary
file happen to be human-readable doesn't make them a mixture
of text files and binary files.

I don't expect people to edit it by hand.  I'd expect the second step is a
--hash-bang-path="/usr/local/bin/lli" (or similar) flag for the compiler
(defaulting to not adding any hash bang path) and/or a very simple
command-line tool capable only of setting or removing hash bang
paths from bytecode files.



-Karl

----- End forwarded message -----
-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu




More information about the llvm-dev mailing list