[PATCH] D114415: [llvm] [Support] Add HTTP Server Support library.

Tue Nov 30 16:59:49 PST 2021

dblaikie added a comment.

In D114415#3162737 <https://reviews.llvm.org/D114415#3162737>, @noajshu wrote:

> In D114415#3159995 <https://reviews.llvm.org/D114415#3159995>, @dblaikie wrote:
>
>> In D114415#3159988 <https://reviews.llvm.org/D114415#3159988>, @phosek wrote:
>>
>>> In D114415#3159905 <https://reviews.llvm.org/D114415#3159905>, @dblaikie wrote:
>>>
>>>> What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?
>>>
>>> It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) `debuginfod-find` client and the corresponding library that could be integrated into other tools and (2) `debuginfod` which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.
>>
>> Oh, fair enough - just checking it wasn't only being implemented for testing. Would the python script from the llvm-symbolizer test be adequate for the production/local developer scenarios you have in mind? (I don't mind the C++ too much, but just trying to understand the landscape, tradeoffs, etc)
>
> The python script used in D113717 <https://reviews.llvm.org/D113717> (which might also get used to test D112759 <https://reviews.llvm.org/D112759>) only takes care of the HTTP static file serving needed for debuginfod. There is another component, which is to search the filesystem for debug binaries and assemble the collection of artifacts to serve. I'll be publishing a diff shortly which implements this functionality using LLVM's object parsing and filesystem utilities. Then we will have all the pieces needed for a basic C++ debuginfod in LLVM.
>
> There are workarounds we could use to get a simple debuginfod server working without this LLVM HTTP server. E.g., we could use CGI and FastCGI, or we could create symlinks / copies to the discovered debug data in a single static file serving path. These workarounds came with trade offs, and there are other tools in LLVM that could take advantage of a cross-platform HTTP server such as bisectd (D113030 <https://reviews.llvm.org/D113030>). This is what led to the goal of getting an HTTP server in LLVM's supporting libraries. (A side benefit is that we can now thoroughly unit test the HTTP client by letting it communicate with the server, but this isn't the motivation :) .)

Fair enough - thanks for the context!

================
Comment at: llvm/include/llvm/Support/HTTPServer.h:29-31
+  /// The first element is the entire url path, and the rest of the elements
+  /// correspond to match groups in the url path matching regex.
+  SmallVector<std::string, 2> UrlPathMatches;
----------------
Should the first element instead be a separate member, since it seems it's not like the rest of the elements/will be handled differently?

================
Comment at: llvm/lib/Support/HTTPServer.cpp:58
+          },
+          [=](bool Success) { delete MB; }};
+}
----------------
Could this parameter (the last lambda, the "CompletionHandler" member in StreamingHTTPResponse) be move-only (non-copyable) in that case it could capture-by-move the std::unique_ptr<MemoryBuffer> and be a bit more robust in terms of memory management?

================
Comment at: llvm/lib/Support/HTTPServer.cpp:67
+
+Error HTTPServer::get(StringRef UrlPathPattern, HTTPRequestHandler Handler) {
+  std::string ErrorMessage;
----------------
Perhaps HTTPRequestHandler should be an `llvm::function_ref` since it doesn't appear to outlive this call, if I'm understanding the code correctly?

================
Comment at: llvm/lib/Support/HTTPServer.cpp:94
+               HTTPServerRequest Request;
+               for (auto PathComponent : HTTPLibRequest.matches)
+                 Request.UrlPathMatches.push_back(std::string(PathComponent));
----------------
should this be `const auto&`? (I'm not sure how big/expensive to copy these objects are, perhaps it's suitable to copy them)

================
Comment at: llvm/lib/Support/HTTPServer.cpp:101-102
+                     if (Offset < Response.ContentLength) {
+                       std::string Chunk = Response.Provider(Offset, Length);
+                       Sink.write(Chunk.c_str(), Chunk.size());
+                     }
----------------
Looks like this could use StringRef instead of std::string, to avoid copying the contents? Or are there cases where a provider might want to return content by value, rather than from some underlying/longer-lived storage? (worth providing something that could be optimal for both cases somehow - like providing an optional std::string out parameter that the provider could populate if it doesn't have its own long-lived backing store?)

================
Comment at: llvm/lib/Support/HTTPServer.cpp:125-130
+Expected<unsigned> HTTPServer::bindAny(StringRef HostInterface) {
+  SmallString<16> HostInterfaceStorage;
+  StringRef S =
+      Twine(HostInterface).toNullTerminatedStringRef(HostInterfaceStorage);
+
+  int ListenPort = Server.bind_to_any_port(S.begin());
----------------
If the underlying API requires a c-style string, maybe simpler to expose that up through the caller layers? Rather than allocating a buffer to create a null terminated string when the caller might already have one to pass down anyway? (alternatively, I guess, if you use Twine as the parameter type rather than StringRef, then at least in the cases where the Twine does refer to a null terminated string it won't have to copy the string again - currently since the HostInterface is StringRef, it'd always have to copy into the buffer and append a null terminator)

================
Comment at: llvm/lib/Support/HTTPServer.cpp:142-144
+    return createStringError(
+        errc::io_error,
+        "An unknown error occurred when cpp-httplib attempted to listen.");
----------------
Follow-up work to use httplib set_error_handler to improve these error messages, perhaps?

================
Comment at: llvm/lib/Support/HTTPServer.cpp:158-162
+HTTPServer::HTTPServer(
+    std::function<StreamingHTTPResponse(StringRef)> StreamingRequestHandler) {
+  llvm_unreachable(
+      "Attempt to instantiate HTTPServer when no implementation is available.");
+}
----------------
This looks out of date, presumably it doesn't compile? (HTTPServer doesn't declare any ctors, so I guess this would fail to compile)

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:73
+  EXPECT_THAT_EXPECTED(PortOrErr, Succeeded());
+  unsigned &Port = *PortOrErr;
+  ThreadPool Pool(hardware_concurrency(1));
----------------
probably make this by-value?

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:77
+  std::string Url = "http://localhost:" + utostr(Port);
+  HTTPClient::initialize();
+  Expected<HTTPResponseBuffer> BufferOrErr = HTTPClient().get(Url);
----------------
Is some corresponding de-initialize/teardown required/desirable? (perhaps some scoped initialize/de-initialize) or put the initialize in a global ctor (or test fixture (SetUp? Whatever the one is that's called before any test (but not before every test individually)) in the test file if there's no expectation/desire to initialize/teardown around each test?

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:82
+  EXPECT_EQ(Buffer.Code, Response.Code);
+  EXPECT_EQ(Buffer.Body->MemoryBuffer::getBuffer(), Response.Body);
+  Server.stop();
----------------
The `MemoryBuffer::` probably isn't needed here, I'd have thought?

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:184-189
+  // Timeout below 50ms, request should fail
+  Client.setTimeout(40);
+  Expected<HTTPResponseBuffer> BufferOrErr = Client.get(Url);
+  EXPECT_THAT_EXPECTED(BufferOrErr, Failed<StringError>());
+  // Timeout above 50ms, request should succeed
+  Client.setTimeout(60);
----------------
timing tests like this can be sensitive to machine load - perhaps setting the acceptable timeout much higher than the actual timeout would be good to increase test stability? (eg: the 40 is probably fine - the actual 50ms delay shouldn't ever be shorter than requested, and so it probably shouldn't tickle that test (though it's possible - if the test code were delayed in reaching the timeout call while the delay was already running, it could appear as though the delay was shorter than expected) - but changing the 60 up to something well above the threshold might be good).

Though also, it's not your job to test that HTTPClient implements timeouts correctly, only that you pass them down through the API - so if there's a more robust way to test that property without actually having to run real timeouts (or at least not testing them so precisely) it might be helpful.

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:206
+                 [&](HTTPServerRequest Request) -> HTTPResponse {
+                   EXPECT_EQ(Request.UrlPathMatches.size(), 3u);
+                   EXPECT_EQ(Request.UrlPathMatches[0], "/abc/1/2");
----------------
I think there's probably a matcher that matches all the elements in a more terse format than having to check each separately? ( https://stackoverflow.com/q/1460703 )

================
Comment at: llvm/unittests/Support/HTTPServer.cpp:215-217
+                                 assert(false &&
+                                        "Should not reach this handler");
+                                 return Handler(Request);
----------------
Prefer `llvm_unreachable` over `assert(false)`

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114415/new/

https://reviews.llvm.org/D114415