[PATCH] D48413: [llvm-strip] Add initial support for static libraries

Thu Jun 21 07:21:01 PDT 2018

jhenderson added a comment.

In https://reviews.llvm.org/D48413#1139135, @alexshap wrote:

> James, let me try to address your concerns:
>
> 1. so basically many llvm-objcopy/llvm-strip features (if not almost everything) are applicable to static libraries as well. Roughly speaking  `llvm-objcopy <some options> lib.a` is equivalent to the following: unpack liba.a -> for each file F run `llvm-objcopy <some options> F F` -> pack all the files F into lib.a again. However, now we have smth about 100 tests for llvm-objcopy and modifying every single test (adding extra runs for static libraries) is a lot of work and doesn't seem to be a good idea at all to me / doesn't seem to provide a lot of value. Basically in that case we would test the same code path again and again but the testing time will increase significantly. Keeping this in my mind I decided to add tests for the real-world use-case first (stripping static libraries) because it's being used in a real-world project right now.

The point I think I was trying to make was that llvm-objcopy is kind of our "first class citizen" when it comes to testing - i.e. tests are generally written in terms of llvm-objcopy, and where they apply to llvm-strip, we add it as a test-case to the existing test. For unique functionality to llvm-strip, it gets its own test. Archive handling is not, from my understanding, different between the two tools, so when we test it, we should test via use of llvm-objcopy, rather than llvm-strip (i.e. the only change needed would be changing the llvm-strip command in the tests to llvm-objcopy --strip-all etc). As a related point, we should really have this as part of the "basic-copy.test" or a new test called "basic-archive.test" to show the behaviour of llvm-objcopy and archives as a basic point.

I'm still thinking about the more global point about testing archive handling. I'm kind of leaning towards the idea that as long as we have one test that shows that archive members can be manipulated in some way (whether via strip-all or otherwise), that is sufficient. The reason being that we know that the code path is identical once we start operating on the member. It's just replacing the existing member that is important. I do think we might want tests to show what happens if symbols are stripped from an archive member, since that would affect the contents of the archive symbol table (if present). We should also ensure that an archive symbol table is preserved, if present in the input, or not added, if not.

> 
> 
> 2. the apis which return Expected<T> are ubiquitous and these types must be checked (otherwise the destructor will call terminate()),

Sure, but if there is a situation that can never be hit, under any circumstance maybe it makes more sense to use a different method other than reportError.

> even in llvm-objcopy we have smth like ~80 checks if i'm not mistaken (i was looking at grep -r -n -i Error ./* | wc -l), but we have ~100 tests. So very obviously 100 tests don't cover every single case even right now (and, in fact, it's far from it). I'm not sure that adding what you called "missing" tests is worth doing in this patch. For some erros (for example, Ar->children(Error)) I'm not even sure if for such a test llvm-objcopy is the right place. However, having said that, I think it's necessary to add a test for a malformed archive and for an archive with "mixed" content (.o mixed with smth else). Would that be satisfiable for you ?

What I'm interested in testing is that every code path in llvm-objcopy is checked (as much as reasonably possible). We may have missed some in the past, but that's no excuse not to do things right this time. However, I accept that not all paths can be sensibly testable, so I'm only asking for those cases that can be feasibly tested to be so. We don't need to test every different way each check might fire (i.e. if our code checks the return of `foo()`, but doesn't implement it, we don't need to test every case where foo might fail, only one of them). In an ideal world, we'd have unit-tests that could check these finer grained points more sensibly, but we don't so that means we're stuck testing things through the high-level interface, which in turn means we may not be able to test everything (that's fine), but we should test what we can.

================
Comment at: test/tools/llvm-objcopy/strip-all.test:27
+# RUN: llvm-strip %t.a
+# RUN: llvm-ar -p %t.a %basename_t.tmp >%t6
+# RUN: cmp %t2 %t6
----------------
alexshap wrote:
> jhenderson wrote:
> > The "-" part of "-p" is unnecessary so can be removed (either that or you should add it to "crs" for consistency).
> > 
> > What is the purpose of "%basename_t.tmp"?
> > 
> > Does lit use it's own redirection mechanism? I ask because on Windows Powershell encodes files created via ">" as UTF-16, which obviously isn't suitable for binary files. I'd be more inclined to use the extract operation (x) to achieve the same thing, as it's less likely to cause a problem.
> %basename_t.tmp is expanded by lit into the basename of %t .
> so llvm-ar p %t.a %basename_t.tmp prints the content of a given file from the archive.
> -x will extract a file into the current folder so it will overwrite the existing file, I somehow didn't like this.
>  
Yeah, the overwriting isn't great, and I get why you didn't want to use -x in the first place. But does the redirection work for all users? I might have to test this myself, since I use Windows and Powershell as my primary development environment, and it's important that the binary file doesn't get encoded as it gets printed out.

So we need %basename_t because %t is an absolute path, right? I think this sounds more like a deficiency in the test framework - I don't like the need to know that %t ends up as <some path>.tmp for this test to work, as it should be an implementation detail that the test doesn't care about. That being said, it's probably outside the scope of this change to worry about that, if there is no clear alternative (but perhaps a new macro that expands to <the same as %t without the directory components> might be appropriate at some point).

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:456
+    Error Err = Error::success();
+    for (auto &Child : Ar->children(Err)) {
+      Expected<std::unique_ptr<Binary>> ChildOrErr = Child.getAsBinary();
----------------
jhenderson wrote:
> Probably need to avoid using auto here, as the type isn't obvious to me.
> 
> Ditto at the uses of auto further down.
What happens if Err is non-success value? Does Ar->children return an empty list?

Repository:
  rL LLVM

https://reviews.llvm.org/D48413