[Lldb-commits] [PATCH] D56322: [Reproducers] SBReproducer framework

Fri Feb 1 04:27:20 PST 2019

labath added a comment.

I have a lot of comments. The two major ones are:

- i think the way you link the tests is in the UB territory. I explain this in detail in one of the inline comments.
- I believe that your unit tests (not just in this patch) focus too much on testing the behavior of a single method, even when that method does not have an easily observable results. For this you then often need to expose internals of the tested class to be able to test some effect of that method on the internal state. This not all bad (i've done it myself sometimes), and  it's definitely better that not writing any unit tests. However, it's generally better to test the public interface of a method/class/entity, and in this case, I believe it should be possible. I'm imagining some tests like having a dummy class with a bunch of methods which are annotated with the SB_RECORD macros. Then, in the test you call the methods of this class with some arguments, have the repro framework serialize the calls to a (in-memory?) stream, and verify the contents of that stream. This will test a lot more code  -- the (De)Serialize functions are just fancy ways of invoking memcpy, but if you take all of that together with the SB_RECORD, SB_REGISTER macros, it becomes some pretty deep magic that is interesting to test exhaustively (it's fine, but not really interesting to test that Deserialize<T*> and Deserialize<T&> do the right thing for each of the possible fundamental types.) And if you use the deserializer interface to read out the recorded traces (instead of comparing raw bytes), then you can avoid depending on the endianness of the values.

Conversely, you can write a trace file with the serializer interface, and then tell the replayer to invoke the mock class which will verify that it was called with the right arguments.

================
Comment at: source/API/SBReproducer.cpp:1
+//===-- SBReproducer.cpp ----------------------------------------*- C++ -*-===//
+//
----------------
I found it weird to have one cpp file implementing two headers (`SBReproducer.h` and `SBReproducerPrivate.h`). Can you split it into two files? (This will come out naturally, if we split this up into modules as I mention in one of the other comments.)

================
Comment at: source/API/SBReproducer.cpp:69
+  repro::Loader *loader = repro::Reproducer::Instance().GetLoader();
+  if (!loader) {
+    return false;
----------------
drop `{}` (that's what you seem to do elsewhere in this file).

================
Comment at: source/API/SBReproducerPrivate.h:44
+    void *object = GetObjectForIndexImpl(idx);
+    return static_cast<typename std::remove_const<T>::type *>(object);
+  }
----------------
Why do you need to `remove_const` here? If `T` is `const`, then const will be added back by the return statement anyway. If `T` is not `const`, then `remove_const` is a noop.

================
Comment at: source/API/SBReproducerPrivate.h:65-69
+    auto it = m_mapping.find(idx);
+    if (it == m_mapping.end()) {
+      return nullptr;
+    }
+    return m_mapping[idx];
----------------
replace by `m_mapping.lookup(idx)`, at which point you can consider dropping this function entirely.

================
Comment at: source/API/SBReproducerPrivate.h:82-93
+/// Base class for tag dispatch used in the SBDeserializer. Different tags are
+/// instantiated with different values.
+template <unsigned> struct SBTag {};
+
+/// We need to differentiate between pointers to fundamental and
+/// non-fundamental types. See the corresponding SBDeserializer::Read method
+/// for the reason why.
----------------
Just curious: What's the advantage of this over just declaring a bunch of Tag classes directly (struct PointerTag{}; struct ReferenceTag{}; ...)?

================
Comment at: source/API/SBReproducerPrivate.h:121
+public:
+  SBDeserializer(llvm::StringRef buffer = {}) : m_buffer(buffer), m_offset(0) {}
+
----------------
Instead of the m_offset variable, what do you think about just `drop_front`ing the appropriate amount of bytes when you are done with the? It doesn't look like you'll ever need to go back to an earlier point in the stream...

================
Comment at: source/API/SBReproducerPrivate.h:155-161
+  // FIXME: We have references to this instance stored in replayer instance. We
+  // should find a better way to swap out the buffer after this instance has
+  // been created, but his will have to do for now.
+  void LoadBuffer(llvm::StringRef buffer) {
+    m_buffer = buffer;
+    m_offset = 0;
+  }
----------------
It sounds to me like this could be solved by passing the deserializer to the `operator()` of the SBReplayers instead of having it a member variable set by the constructor. This design also makes more sense to me, as theoretically there is no reason why all replay calls would have to come from the same deserializer object. You may even want to have a separate deserializer for each thread when you get around to replaying multithreaded recordings.

After this, it may even be possible to make the deserializer object be a local variable in the `SBRegistry::Replay` function.

================
Comment at: source/API/SBReproducerPrivate.h:167
+private:
+  template <typename T> T Read(ValueTag) {
+    T t;
----------------
I guess if we don't consider the recording to be "input" then we don't have to handle the case of not having enough bytes in the file extremely gracefully here, but it would still be nice to at least assert that we are not running off the end of the buffer here.

================
Comment at: source/API/SBReproducerPrivate.h:169
+    T t;
+    std::memcpy((char *)&t, &m_buffer.data()[m_offset], sizeof(t));
+    m_offset += sizeof(t);
----------------
reinterpret_cast

================
Comment at: source/API/SBReproducerPrivate.h:182
+    typedef typename std::remove_reference<T>::type UnderlyingT;
+    // If this is a reference to a fundamental type we just read its value.
+    return *m_index_to_object.template GetObjectForIndex<UnderlyingT>(
----------------
Is this correct? I would expect the fundamental references to not go through this overload at all...

================
Comment at: source/API/SBReproducerPrivate.h:303
+        DeserializationHelper<Args...>::template deserialized<Result>::doit(
+            m_deserializer, f));
+  }
----------------
Shouldn't this call `g` then?

================
Comment at: source/API/SBReproducerPrivate.h:307-309
+  Result (*f)(Args...);
+  /// A custom function.
+  Result (*g)(Args...);
----------------
If you gave these more meaningful names, then the comments would be unneeded.

================
Comment at: source/API/SBReproducerPrivate.h:370
+  /// Register the given replayer for a function (and the ID mapping).
+  void DoRegister(uintptr_t RunID, SBReplayer *replayer, unsigned id) {
+    m_sbreplayers[RunID] = std::make_pair(replayer, id);
----------------
It looks like this `id` argument isn't really needed here, as the DoRegister function can just synthesize the id on its own. In fact, the `m_id` is probably not needed either, as you can just use `m_sbreplayers.size()+1` or something.

================
Comment at: source/API/SBReproducerPrivate.h:371
+  void DoRegister(uintptr_t RunID, SBReplayer *replayer, unsigned id) {
+    m_sbreplayers[RunID] = std::make_pair(replayer, id);
+    m_ids[id] = replayer;
----------------
`assert(m_sbreplayers.find(RunID) == end())` to make sure our RunID magic didn't do something funny?

================
Comment at: source/API/SBReproducerPrivate.h:419-448
+/// Maps an object to an index for serialization. Indices are unique and
+/// incremented for every new object.
+///
+/// Indices start at 1 in order to differentiate with an invalid index (0) in
+/// the serialized buffer.
+class SBObjectToIndex {
+public:
----------------
What's the threading scenario in which you expect this to be called? If multiple threads can call `GetIndexForObject` simultaneously (even if the objects differ), then I think you need to guard `m_mapping` with the mutex too, not just the integer variable.

BTW: The `m_index` can also just be `m_mapping.size()+1`

================
Comment at: source/API/SBReproducerPrivate.h:453
+public:
+  SBSerializer(llvm::raw_ostream &stream = llvm::outs()) : m_stream(stream) {}
+
----------------
Is this default value used for anything? It don't see it being used, and it doesn't seem particularly useful anyway, as you'd just get a lot of binary junk on stdout.

================
Comment at: source/API/SBReproducerPrivate.h:482
+    if (std::is_fundamental<T>::value) {
+      Serialize(t);
+    } else {
----------------
Will this ever not be an infinite recursion? I am guessing this works because all fundamental types are picked up by the specific overloads below before this function is even invoked. 

Maybe if you just inline the `m_stream.write(reinterpret_cast<const char *>(&t), sizeof(Type))` thingy here, you can avoid having to define a special Serialize function for all fundamental types.

================
Comment at: source/API/SBReproducerPrivate.h:484
+    } else {
+      int idx = m_tracker.GetIndexForObject(&t);
+      Serialize(idx);
----------------
It looks like `GetIndexForObject` returns `unsigned`

================
Comment at: source/API/SBReproducerPrivate.h:562-563
+        m_local_boundary(false), m_result_recorded(false) {
+    if (!g_global_boundary) {
+      g_global_boundary = true;
+      m_local_boundary = true;
----------------
If you wanted to make the updates to g_global_boundary thread-safe, this is not the way to achieve that. This is still going to be racy, so you should at least use the atomic compare-and-exchange operations. However, I don't believe this will still be right thing to do in the multithreaded case, so it may be best do just drop the atomicity until you implement proper threading support.

================
Comment at: source/API/SBReproducerPrivate.h:573-574
+
+  void SetSerializer(SBSerializer &serializer) { m_serializer = &serializer; }
+
+  /// Records a single function call.
----------------
Move this to the constructor, and use `llvm::Optional` in the `SB_RECORD` macros?

This will also avoid having this class muck around with `g_global_boundary` when we are not recording.

================
Comment at: source/API/SBReproducerPrivate.h:578
+  void Record(Result (*f)(FArgs...), const RArgs &... args) {
+    if (!ShouldCapture()) {
+      return;
----------------
Inconsistent braces around a single statement.

================
Comment at: source/API/SBReproducerPrivate.h:584-585
+
+    if (Log *log = GetLogIfAllCategoriesSet(LIBLLDB_LOG_API))
+      log->Printf("Recording #%u '%s'", id, m_pretty_func.data());
+
----------------
if you used the `LLDB_LOG` macro, you wouldn't have to rely on m_pretty_func being implicitly null terminated.

================
Comment at: source/API/SBReproducerPrivate.h:618-626
+  void RecordOmittedResult() {
+    if (m_result_recorded)
+      return;
+    if (!ShouldCapture())
+      return;
+
+    m_serializer->SerializeAll(0);
----------------
I'm wondering if we shouldn't embed some additional safeties here to make sure the result is always recorded when it should be. It seems to be this class should know whether it should expect an explicit `RecordResult` call (based on whether we called the `void` version of `Record` or not). So, we could have this class assert if we reached the destructor without recording anything, instead of silently recording zero. WDYT?

================
Comment at: source/API/SBReproducerPrivate.h:659
+  if (auto *g = lldb_private::repro::Reproducer::Instance().GetGenerator()) {  \
+    lldb_private::repro::SBRecorder sb_recorder(__PRETTY_FUNCTION__);          \
+    sb_recorder.SetSerializer(                                                 \
----------------
`__PRETTY_FUNCTION__` is not portable. I think you want LLVM_PRETTY_FUNCTION here.

================
Comment at: tools/driver/Driver.cpp:913-917
+  SBReproducer reproducer;
+  if (reproducer.Replay()) {
+    return 0;
+  }
+
----------------
The driver should know whether it is replaying things or not, right? Can we have it call `Replay` only if replay mode has actually been requested?

================
Comment at: unittests/API/CMakeLists.txt:4-12
+  LINK_LIBS
+    liblldb
+    lldbCore
+    lldbHost
+  LINK_COMPONENTS
+    Support
+  )
----------------
Hmm.. I'm not sure this is going to work everywhere, but it's certainly going to be weird. `liblldb` is a shared library which deliberately firewalls all `lldb_private` symbols. So any reproducer symbols you defined there will not be accessible to your unit tests. I guess the reason `include_directories` trick kind of works is that most of the reproducer frameworks is implemented in the header, and including that header from the test will cause another copy of all of these symbols to be defined. However, I am still surprised that it does work, as you still have some chunks of this implemented in the cpp file.  I guess you were just lucky to not need those symbols in the tests that you have written. Nonetheless, this is going into very murky waters, and I think the fact that you had to link in  lldbHost and lldbCore in addition to liblldb (even though they are be included in there) demonstrates it.

I think the best way to address this is to move the core of the repro engine into some other library, which can be safely used from unit tests. After all, the core is not really tied to the SB API, and you could easily use it to record any other interface, if you needed to do that for whatever reason. I think this should go into the Utility module, as it doesn't have any other dependencies. I see that you're including FileSystem.h, but it looks like this is only used in the `SetInputFileHandleRedirect` function, which is not actually used anywhere! Even when it is used, I think it would make sense for it to live elsewhere, as this is not a part of the replay core, but rather how a particular api wants to replay a particular function signature.

================
Comment at: unittests/API/SBReproducerTest.cpp:37-41
+      0x01, 0x00, 0x61, 0xcd, 0xcc, 0x8c, 0x3f, 0x02, 0x00, 0x00, 0x00,
+      0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x62, 0x06, 0x00, 0x00,
+      0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x00};
----------------
All of these buffers hard-code endiannes of integers and the representation of floats.

================
Comment at: unittests/API/SBReproducerTest.cpp:83-85
+    SBDeserializer deserializer("a");
+    EXPECT_TRUE(deserializer.HasData());
+    EXPECT_FALSE(deserializer.HasData(1));
----------------
I find this behaviour confusing. Intuitively, I'd expect `HasData(n)` if buffer has at least `n` bytes of data left. So in this case I'd expect `true` in both calls. (In fact I'd probably expect `HasData()` to be a synonym for `HasData(1)`). It looks like you're only calling `HasData` in one place, and you're hardcoding `1` there, so maybe you just want an `EOF()` function?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56322/new/

https://reviews.llvm.org/D56322