[libc-commits] [libc] [libc] [search] improve hsearch robustness (PR #73896)

via libc-commits libc-commits at lists.llvm.org
Thu Nov 30 11:36:31 PST 2023


================
@@ -62,3 +62,10 @@ Often the standard will imply an intended behavior through what it states is und
 Ignoring Bug-For-Bug Compatibility
 ----------------------------------
 Any long running implementations will have bugs and deviations from the standard. Hyrum's Law states that “all observable behaviors of your system will be depended on by somebody” which includes these bugs. An example of a long-standing bug is glibc's scanf float parsing behavior. The behavior is specifically defined in the standard, but it isn't adhered to by all libc implementations. There is a longstanding bug in glibc where it incorrectly parses the string 100er and this caused the C standard to add that specific example to the definition for scanf. The intended behavior is for scanf, when parsing a float, to parse the longest possibly valid prefix and then accept it if and only if that complete parsed value is a float. In the case of 100er the longest possibly valid prefix is 100e but the float parsed from that string is only 100. Since there is no number after the e it shouldn't be included in the float, so scanf should return a parsing error. For LLVM's libc it was decided to follow the standard, even though glibc's version is slightly simpler to implement and this edge case is rare. Following the standard must be the first priority, since that's the goal of the library.
+
+Design Decisions
+================
+
+Resizable Tables for hsearch
+----------------------------
+The POSIX.1 standard does not delineate the behavior consequent to invoking hsearch or hdestroy without prior initialization of the hash table via hcreate. Furthermore, the standard does not articulate the outcomes of successive invocations of hsearch absent intervening hdestroy calls. Libraries such as MUSL and Glibc do not incorporate checks for these scenarios, potentially leading to memory corruption or leakage. Conversely, FreeBSD's libc and Bionic adopt a distinct methodology, automatically initializing the hash table to a minimal size if it is found uninitialized, and proceeding to destroy the table only if initialization has occurred. This approach also renders hcreate redundant if an initialized hash table is already present. Given that the hash table commences with a minimal size, resizing becomes necessary to accommodate additional user insertions. LLVM's libc mirrors the approach of FreeBSD's libc and Bionic, owing to its enhanced robustness and user-friendliness. Notably, such resizing behavior aligns with POSIX.1 standards, which explicitly permit implementations to modify the capacity of the hash table.
----------------
michaelrj-google wrote:

nit: The language here is a bit overcomplicated. I'd recommend simplifying it a bit.

Specific examples: 
adopt -> use
commences -> starts
articulate -> specify

https://github.com/llvm/llvm-project/pull/73896


More information about the libc-commits mailing list