[Lldb-commits] [PATCH] D60667: Allow direct comparison of ConstString against StringRef

Raphael Isemann via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Apr 16 00:42:48 PDT 2019

teemperor marked 2 inline comments as done.
teemperor added a comment.

In D60667#1467387 <https://reviews.llvm.org/D60667#1467387>, @amccarth wrote:

> I, too, have some concern that this could have unintended side effects.  To make the temporary `StringRef`s from the zero-terminated strings requires a `strlen` call each time.  So we're making two passes over a string each time (once to measure and once to compare).  Granted, these are mostly very short.

I think I'm not understanding something here. There is one `strlen` call to make the StringRef from the literal, but I don't see why we need a second one to compare (as StringRef and ConstString both store the string length and don't recompute it when comparing). Simple godbolt test case here: https://godbolt.org/z/9X1RRt So both the old and new version both require one `strlen` call from what I can see (which is anyway optimized out for comparing against literals).

> I'd love to see the static const variables for the short constant strings, just to weigh readability hit.

I can maybe make a new revision to show this, but I hope it's obvious from looking at the code that I changed in this revision. Also there are the problems/style difficulties that hopefully don't need a full revision to see:

1. We preferable want to have variables with the same name as the string content, but often the strings contain reserved words or words that don't translate into our lower_case_code_style so that's not possible. E.g. `var == this_` is not really as readable as `var == "this"` (and it looks like a typo). Same with stuff like `if (arg_type.GetTypeName() == "bool")` vs `if (arg_type.GetTypeName() == bool_)` or `image_infos[i].segments[k].name == ConstString("__TEXT"))` vs `image_infos[i].segments[k].name == text)`.
2. We get a bunch of local variables that just bloat the code:

  if (descriptor->IsCFType()) {
    ConstString type_name(valobj.GetTypeName());
    if (type_name == "__CFMutableBitVector" || type_name == "__CFBitVector" ||
        type_name == "CFMutableBitVectorRef" || type_name == "CFBitVectorRef") {
      if (valobj.IsPointerType())
        is_type_ok = true;

becomes this

  if (descriptor->IsCFType()) {
    ConstString type_name(valobj.GetTypeName());
    static const ConstString cf_mutable_bit_vector("__CFMutableBitVector"),
    if (type_name == cf_mutable_bit_vector || type_name == cf_bit_vector ||
        type_name == cf_mutable_bit_vector_ref || type_name == cf_bit_vector_ref) {
      if (valobj.IsPointerType())
        is_type_ok = true;

3. Code becomes much harder to grep. E.g. if I want to check for the location where we filter the `this` variable from the local variables, I can currently do `git grep "\"this\"" | grep ==` and directly see where we do the comparison. But with local variables I have to do some funky grep with context filtering. In general grepping for these string literals without a few lines of context will be pointless with ConstString variables for literals.

Comment at: lldb/include/lldb/Utility/ConstString.h:173
+    // StringRef doesn't. Therefore we have to do this check manually now.
+    if (!m_string ^ !rhs.data())
+      return false;
amccarth wrote:
> This is very clever way to express the constraint, but it's quite a bit of cognitive load for the reader to figure out what's being tested here.  The comment sort of explains, but leaves it up to the reader to see how that maps to the condition.  (It also leaves me wondering whether it's useful to have ConstString treat empty strings and nullptr as distinct things.)
> A simpler way to communicate the condition to humans might be:
>     if (m_string == nullptr && rhs.data() != nullptr) return false;
>     if (m_string != nullptr && rhs.data() == nullptr) return false;
> The clever expression requires the reader to know or deduce:
> 1.  that `m_string` and `rhs.data()` are pointers,
> 2.  that `!p` is equivalent to `p != 0` and that the `0` will be implicitly converted to `nullptr`, yielding a `bool` that's `true` if `p` is a nullptr,
> 3.  that `^` is bitwise exclusive or,
> 4.  that boolean `!` and bitwise `^` have the appropriate operator precedence, which is not always obvious when mixing types like this,
> 5.  that `true` and `false` are guaranteed to have values such that bitwise `^` will do the expected thing to the promoted types,
> 6.  and that the resulting `int` will be implicitly compared for inequality to 0.
> Most of these are reasonable things to expect a C++ programmer to know, but expecting them to apply all of that knowledge (correctly) to figure out what the expression does seems like an unnecessarily high cognitive burden.
Yeah, that was rather quickly typed down when making this patch. Replaced with your version, thanks!

Also, I don't like the whole nullptr/empty thing in ConstString, but I didn't want to change this behavior as a side effect of this patch.

Comment at: lldb/source/Core/Disassembler.cpp:1407
     return (op.m_type == Instruction::Operand::Type::Register &&
-            (op.m_register == ConstString(info.name) ||
-             op.m_register == ConstString(info.alt_name)));
+            (op.m_register == info.name || op.m_register == info.alt_name));
amccarth wrote:
> This is _probably_ a performance win here, but it's not obvious.  If the resulting lambda is called many times, and it actually captured `ConstString(info.name)` and `ConstString(info.alt_name)` rather than `info.name` and `info.alt_name`, then we're trying off two pointer comparisons for two StringRef comparisons.




More information about the lldb-commits mailing list