[LLVMdev] Reference Manual Clarifications 2

Mon Apr 21 00:03:24 PDT 2008

Am Sonntag, den 20.04.2008, 09:34 -0700 schrieb Jon Sargeant:
> Joachim Durchholz wrote:
> > Am Samstag, den 19.04.2008, 16:24 -0700 schrieb Jon Sargeant:
> >> First, I can assign -1 to 
> >> the count to indicate an invalid or unknown value.
> > 
> > This is a C-ism. In a language that supports discriminated unions well,
> > you'd do something like
> >   type AllocaCount = Invalid | Unknown | Known int
> > (where Invalid, Unknown and Known are the constants that do the
> > distinction between union variants).
> 
> Not necessarily.  Using -1 for an invalid integer is analogous to using 
> null for an invalid pointer.

It is indeed analogous, and using 0 for an invalid pointer can break in
rare circumstances, too. (E.g. the x86 interrupt table is at address 0.
On the 68000, 0 can even be perfectly valid heap memory.)

Using special values to signify special conditions is usually a Bad Idea
(TM). It is also ubiquitous, probably promoted by weak union support in
C (actually in most languages).

> >> Second, if I attempt 
> >> to allocate a negative count, I can print an assertion failure and abort 
> >> the program.  Had I interpreted the count as an unsigned value, the 
> >> program would attempt to allocate anywhere from 2 gigabytes to 4 
> >> gigabytes.
> > 
> > Which might be exactly what it's supposed to do. Suppose you're writing
> > heap management code.
> 
> Perhaps, but very unlikely.  An allocation of 2 gigabytes or more is 
> probably a bug.

I agree, but "very unlikely" is not the same as "impossible".
I.e. if it's just "very unlikely", this means there *will* be cases
where restricting the count to unsigned is unfortunate.

You don't even lose your assertion printing capability. Just have the
allocator check whether that much space is available - that's even
better than catching nominally negative values, because it will also
catch those situations where the program erroneously allocates just a
quarter of the address space.

> >> I'm not necessarily saying that NumElements should be 
> >> signed, only that the choice between signed and unsigned is not obvious.
> > 
> > Obviously, obviousness is in the eye of the beholder :-)
> > (SCNR)
> 
> Yes.  But consider that there are many people who agree with me.  Search 
> for "unsigned vs signed - Is Bjarne Mistaken?" in comp.lang.c++.moderated.

Just look at any part of news:comp.lang.functional and you'll see the
exact opposite sentiment. 

Arguments by majority aren't entirely irrelevant, but they are weak, and
stronger arguments have been given already.
Considering that C++'s support for tagged unions is just marginally
better than that of C, I don't think that comp.lang.c++.moderated is the
right place to look for an unbiased view when it comes to designing
interfaces that should serve many languages. [*]

To see the other side of the fence, take any introductory text on the
Haskell language and look for "pattern matching".
At its core, it's a way to return multiple values from a function
without the syntactic overhead. This opens the possibility to return
multiple pieces of information without having to cram them into a single
scalar.

Regards,
Jo

[*] If we've been talking about the allocation for the LLVM library
itself, then my point is moot since all code that uses the interface is
C++ anyway, and I apologize for getting off-topic.