[PATCH] D53342: [SimplifyLibCalls] Mark known arguments with nonnull

Thu Aug 22 20:36:00 PDT 2019

hfinkel added a comment.

In D53342#1640996 <https://reviews.llvm.org/D53342#1640996>, @xbolva00 wrote:

> >> do we add the attributes based on the function names.
>
> I dont see this as a problem - reserved names.
>
> >> What to do with the attributes that already exist in, e.g., some glibc header files.
>
> Can you point me on real example/OS/glibc version where this actually happens?
>
> I have Ubuntu 18 LTS and glibc and I see no such attributes in IR.

The glibc header files have the attributes, AFAIKT: https://github.com/bminor/glibc/blob/master/string/string.h - and this looks like the header on the Fedora 27 system that I just checked.

  /* Copy N bytes of SRC to DEST.  */
  extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
                       size_t __n) __THROW __nonnull ((1, 2));

If you're not seeing the attributes in the IR, and I'm not seeing them either, we might just not be adding them to avoid this issue. I do see them in Clang's AST:

  |-FunctionDecl x </usr/include/string.h:42:1, /usr/include/sys/cdefs.h:289:63> /usr/include/string.h:42:14 used memcpy 'void *(void *restrict, const void *restrict, size_t)' extern
  | |-ParmVarDecl x <col:22, col:39> col:39 __dest 'void *restrict'
  | |-ParmVarDecl x <col:47, col:70> col:70 __src 'const void *restrict'
  | |-ParmVarDecl x <line:43:8, col:15> col:15 __n 'size_t':'unsigned long'
  | |-NoThrowAttr x </usr/include/sys/cdefs.h:55:35>
  | `-NonNullAttr x <line:289:44, /usr/include/string.h:43:44> 1 2

In D53342#1641039 <https://reviews.llvm.org/D53342#1641039>, @xbolva00 wrote:

> - these are libc functions and almost all C code is compiled by gcc, I think if the code triggered UB, gcc 4.9 and newer already exploited this UB -> UB releaved by “miscompile” or GCC UB sanitizer
> - small percentange of C code which is compiled only by clang is hopefully tested with sanitizer -> UB revealed
>
>   On the other side the cost of patching Clang and llvm (+ maintenance) to keep this UB working in the user code (I would be very interested to see how many % of all code it could broken in 2019 - I believe very very small percentange) seems quite high.

You seem to be assuming that it would be obvious that a given program would be negatively affected by this, and I don't think this would be obvious at all. Sanitizers are great tools for debugging problems, but they're dynamic tools, so they only test the code paths and the parts of the configuration space that the instrumented program actually executes. Best practice may be to fuzz test all code with sanitizers, which can provide more confidence in finding problems like this, but only a very small percentage of the code that exists has been tested in this way (or has been subjected to any suitable, effective static analysis).

I look at this as something about which reasonable people can disagree. No one really has data on this, AFAIK, and so I just have my judgement. I suspect that the percentage of affected programs here is small, but not tiny. memcpy is a very-commonly-used function, and so I suspect we're in a situation where a small (but not tiny) percentage of a very large amount of code yields a large number of affected programs.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53342/new/

https://reviews.llvm.org/D53342