[llvm-dev] GEP with a null pointer base

Kaylor, Andrew via llvm-dev llvm-dev at lists.llvm.org
Thu Jul 6 12:28:48 PDT 2017

I’m not entirely opposed to solution #3.  As I said, my concern is that there are cases it would miss.

For instance, if I had some code like this:

char *get_ptr(char *base, intptr_t offset) {
  return base + offset;

char *convert_to_ptr(intptr_t ptr_val) {
  return get_ptr((char*)0, ptr_val);

There the idiom would only appear after inlining, so the front end couldn’t handle it.  The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way.  If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases.


From: Chris Lattner [mailto:clattner at nondot.org]
Sent: Thursday, July 06, 2017 11:53 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] GEP with a null pointer base

On Jul 6, 2017, at 11:06 AM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

 I've got a problem that I would like some input on.  The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this:

  p = (char*)0 + n
3) Have the front end recognize this particular idiom and translate it directly as inttoptr.

We like the first solution best.  The second "solution" is basically a punt.  It does away with the immediate problem but leaves the code basically working by chance.  I think the third solution is incomplete, because it relies on the front end being able to detect the use of a null pointer whereas that might not emerge until a few basic optimizations have been performed.

I was hoping to get some more input on this matter before proceeding.

Personally, I’d prefer #3 for two reasons:
- This is a very C specific weirdness, so putting it into the frontend makes sense.
- This is really about supporting a specific (horrible :-) idiom.  It makes sense to recognize this in the frontend, which is close to the idiom truth, rather than in the optimizer, which is run multiple times and sees code after being transformed.

I see this as pretty similar to the analogous hacks we do to support broken offsetof idioms.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/adec292d/attachment.html>

More information about the llvm-dev mailing list