[PATCH] D111252: [llvm] [Support] [Debuginfo] Add http and debuginfod client libraries and llvm-debuginfod-find tool

Wed Oct 20 00:49:11 PDT 2021

labath added a comment.

Let me just say outright that I don't feel qualified to say whether this belongs in llvm (I think it does) or not, so I am not going to click approve no matter how well you respond to my comments. However, I do have some experience with sockets and funky tests, so I think I can say something useful about those aspects of the patch.

================
Comment at: llvm/lib/Debuginfod/Debuginfod.cpp:1-2
+//===-- llvm/Support/Debuginfod.cpp - Debuginfod client library -----------===//
+//-*-===//
+//
----------------
wrap fail

================
Comment at: llvm/lib/Support/CMakeLists.txt:79
+if (LLVM_ENABLE_CURL)
+  set(system_libs ${system_libs} ${CURL_LIBRARIES})
+endif()
----------------
Who sets `CURL_LIBRARIES` ?

================
Comment at: llvm/lib/Support/HTTPClient.cpp:15-23
+#include "llvm/Support/HTTPClient.h"
+
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/MemoryBuffer.h"
----------------
This isn't super important, but the best way to ensure you follow the [[ https://llvm.org/docs/CodingStandards.html#include-style | rules for include ordering ]], is to put them in a single block (with no empty lines), as then clang-format will order them for you.

================
Comment at: llvm/lib/Support/HTTPClient.cpp:34
+  std::unique_ptr<WritableMemoryBuffer> Buffer;
+};
+
----------------
The anonymous namespace should end here, and the rest should be static functions, per <https://llvm.org/docs/CodingStandards.html#anonymous-namespaces>

================
Comment at: llvm/lib/Support/HTTPClient.cpp:119-120
+#undef DEBUG_TYPE
+
+} // end namespace llvm
----------------
This isn't explicitly spelled out anywhere (and probably not used entirely consistently), but I believe the prevalent style, and one most consistent with the "make namespaces as small as possible" in the anonymous namespace section is to slap a `using namespace llvm;` at the start of the file, and then to explicitly prefix with `llvm::` the header functions that you're defining.

================
Comment at: llvm/test/tools/llvm-debuginfod/client-server-test.py:23
+    for group in groups:
+        time.sleep(group_delays[group])
+        print(group)
----------------
Can we get rid of these sleeps? Like, maybe the server could somehow signal (create a file, write to stdout/other fd, ...) that it has initialized and is ready to serve)..

I can speak from experience that getting sleep-based tests to work correctly is very tricky. You need to set the timeout very high to ensure you cover the final 0.01 percentile, and when you do that, you end up needlessly slowing down the other 99.99% of the test runs.

================
Comment at: llvm/test/tools/llvm-debuginfod/llvm-debuginfod-find.test:1
+# REQUIRES: debuginfod_client
+# RUN: python %S/client-server-test.py \
----------------
Since, presumably, every test in this folder is going to have this clause, it would be better to do this via a `lit.local.cfg` file.

================
Comment at: llvm/test/tools/llvm-debuginfod/llvm-debuginfod-find.test:4
+# RUN:     --servers '{"args": ["python", "-m", "http.server", \
+# RUN:                          "--directory", "%S/Inputs", "13576"]}' \
+# RUN:     --clients '{"args": ["llvm-debuginfod-find", \
----------------
There's absolutely no chance this will pass reliably with a hard-coded port. You'll need a mechanism to select a free port at runtime. You can do that by passing port 0 to `http.server.HTTPServer` and then fetching the actual port via `instance.server_port`.

This can then be combined with the sleep comment, as you can take the notification of the listening port as a positive acknowledgement that the server is ready to accept connections.

================
Comment at: llvm/test/tools/llvm-debuginfod/llvm-debuginfod-find.test:5-13
+# RUN:     --clients '{"args": ["llvm-debuginfod-find", \
+# RUN:                 "fake_build_id", "--executable"],\
+# RUN:                 "env": {"DEBUGINFOD_URLS":"http://localhost:13576"}}'\
+# RUN:               '{"args": ["llvm-debuginfod-find", \
+# RUN:                 "fake_build_id", "--debuginfo"],\
+# RUN:                 "env": {"DEBUGINFOD_URLS":"http://localhost:13576"}}'\
+# RUN:               '{"args": ["llvm-debuginfod-find", \
----------------
I don't think this is a good way to design a test. I expect it will be very hard (or outright impossible) to get this working on windows because of the differences in quoting behavior. I'd recommend a pattern like:
```
RUN: python %S/client-server-test.py %s

// Or the server could be started automatically, possibly within the same process, as that makes it easier to pass the actual port number
SERVER: ???

// DEBUGINFOD_URLS can probably be set automatically
CLIENT: llvm-debuginfod-find fake_build_id --executable
CLIENT: llvm-debuginfod-find fake_build_id --source /directory/file.c
...
```

Only the RUN stanza would be parsed by lit. The rest would be processed by the python script. The thing you lose this way is the built-in lit substitutions (they'll only work on the RUN line), but it's not clear to me how useful would those actually be. OTOH, this means you can implement custom substitutions, tailored to your use case.

If you're going to have a **lot** of these tests, you could also consider implementing a custom test format, which would give you an even greater flexibility on how to write and run these tests, but that's probably premature at this point.

(BTW: The `# ` in front of all the lines is completely redundant. The reason it's normally present is because test file is also a valid source file for some language, but that is not the case here.)

================
Comment at: llvm/tools/llvm-debuginfod/CMakeLists.txt:9
+    )
+  set_property(TARGET llvm-debuginfod-find PROPERTY LLVM_SYSTEM_LIBS ${imported_libs})
+  if(LLVM_INSTALL_BINUTILS_SYMLINKS)
----------------
What's the purpose of this?

================
Comment at: llvm/tools/llvm-debuginfod/llvm-debuginfod-find.cpp:76
+  SmallString<64> CacheDirectoryPath = StringRef(CacheDir);
+  if (!CacheDirectoryPath.size() &&
+      !sys::path::cache_directory(CacheDirectoryPath))
----------------
s/!size()/empty()

================
Comment at: llvm/tools/llvm-debuginfod/llvm-debuginfod-find.cpp:78
+      !sys::path::cache_directory(CacheDirectoryPath))
+    CacheDirectoryPath = ".";
+
----------------
When does the current directory fallback kick in? Is it actually useful? Should you exit instead?

================
Comment at: llvm/tools/llvm-debuginfod/llvm-debuginfod-find.cpp:92
+  } else {
+    llvm_unreachable("invalid asset request");
+  }
----------------
llvm_unreachable is not appropriate for user error (wrong command line arguments).

================
Comment at: llvm/unittests/Support/HTTPClient.cpp:15
+
+TEST(HTTPClientTests, invalidUrlTest) {
+  std::string invalidUrl = "llvm is fun";
----------------
phosek wrote:
> Ditto for this one.
This would be better done as `EXPECT_THAT_EXPECTED(httpGet(...), Failed<StringError>())` or even `FailedWithMessage("whatever")`, though I agree that this is not very useful.

I think the interesting question is whether we're ok with not having any coverage for the lower level apis, and I'm not really sure what the answer to that is.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111252/new/

https://reviews.llvm.org/D111252