<div dir="ltr">Hi Utkarsh,<div><br></div><div>I've temporarily reverted this here:</div><div><br></div><div>echristo@athyra ~/s/llvm-project (master)> git push<br>To github.com:llvm/llvm-project.git<br>   1f0b43638ed..549e55b3d56  master -> master<br></div><div><br></div><div>the decision forest header file referenced in the unittest doesn't appear to have made it into the commit?</div><div><br></div><div>Thanks and feel free to follow up if I've missed something.</div><div><br></div><div>-eric</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 18, 2020 at 12:38 PM Utkarsh Saxena via llvm-branch-commits <<a href="mailto:llvm-branch-commits@lists.llvm.org">llvm-branch-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

Author: Utkarsh Saxena<br>

Date: 2020-09-18T18:27:42+02:00<br>

New Revision: 85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a<br>

<br>

URL: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a</a><br>

DIFF: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff</a><br>

<br>

LOG: [clangd] Add Random Forest runtime for code completion.<br>

<br>

Summary:<br>

[WIP]<br>

- Proposes a json format for representing Random Forest model.<br>

- Proposes a way to test the generated runtime using a test model.<br>

<br>

TODO:<br>

- Add generated source code snippet for easier review.<br>

- Fix unused label warning.<br>

- Figure out required using declarations for CATEGORICAL columns from Features.json.<br>

- Necessary Google3 internal modifications for blaze before landing.<br>

- Add documentation for format of the model.<br>

- Document more.<br>

<br>

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits<br>

<br>

Tags: #clang<br>

<br>

Differential Revision: <a href="https://reviews.llvm.org/D83814" rel="noreferrer" target="_blank">https://reviews.llvm.org/D83814</a><br>

<br>

Added: <br>

    clang-tools-extra/clangd/quality/CompletionModel.cmake<br>

    clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>

    clang-tools-extra/clangd/quality/README.md<br>

    clang-tools-extra/clangd/quality/model/features.json<br>

    clang-tools-extra/clangd/quality/model/forest.json<br>

    clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>

    clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>

    clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>

    clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>

<br>

Modified: <br>

    clang-tools-extra/clangd/CMakeLists.txt<br>

    clang-tools-extra/clangd/unittests/CMakeLists.txt<br>

    clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>

<br>

Removed: <br>

<br>

<br>

<br>

################################################################################<br>

diff  --git a/clang-tools-extra/clangd/CMakeLists.txt b/clang-tools-extra/clangd/CMakeLists.txt<br>

index 3a1a034ed17b..9d2ab5be222a 100644<br>

--- a/clang-tools-extra/clangd/CMakeLists.txt<br>

+++ b/clang-tools-extra/clangd/CMakeLists.txt<br>

@@ -28,6 +28,9 @@ set(LLVM_LINK_COMPONENTS<br>

   FrontendOpenMP<br>

   Option<br>

   )<br>

+  <br>

+include(${CMAKE_CURRENT_SOURCE_DIR}/quality/CompletionModel.cmake)<br>

+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/quality/model CompletionModel clang::clangd::Example)<br>

<br>

 if(MSVC AND NOT CLANG_CL)<br>

  set_source_files_properties(CompileCommands.cpp PROPERTIES COMPILE_FLAGS -wd4130) # disables C4130: logical operation on address of string constant<br>

@@ -77,6 +80,7 @@ add_clang_library(clangDaemon<br>

   TUScheduler.cpp<br>

   URI.cpp<br>

   XRefs.cpp<br>

+  ${CMAKE_CURRENT_BINARY_DIR}/CompletionModel.cpp<br>

<br>

   index/Background.cpp<br>

   index/BackgroundIndexLoader.cpp<br>

@@ -117,6 +121,11 @@ add_clang_library(clangDaemon<br>

   omp_gen<br>

   )<br>

<br>

+# Include generated CompletionModel headers.<br>

+target_include_directories(clangDaemon PUBLIC<br>

+  $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>

+)<br>

+<br>

 clang_target_link_libraries(clangDaemon<br>

   PRIVATE<br>

   clangAST<br>

<br>

diff  --git a/clang-tools-extra/clangd/quality/CompletionModel.cmake b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>

new file mode 100644<br>

index 000000000000..60c6d2aa8433<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>

@@ -0,0 +1,37 @@<br>

+# Run the Completion Model Codegenerator on the model present in the <br>

+# ${model} directory.<br>

+# Produces a pair of files called ${filename}.h and  ${filename}.cpp in the <br>

+# ${CMAKE_CURRENT_BINARY_DIR}. The generated header<br>

+# will define a C++ class called ${cpp_class} - which may be a<br>

+# namespace-qualified class name.<br>

+function(gen_decision_forest model filename cpp_class)<br>

+  set(model_compiler ${CMAKE_SOURCE_DIR}/../clang-tools-extra/clangd/quality/CompletionModelCodegen.py)<br>

+  <br>

+  set(output_dir ${CMAKE_CURRENT_BINARY_DIR})<br>

+  set(header_file ${output_dir}/${filename}.h)<br>

+  set(cpp_file ${output_dir}/${filename}.cpp)<br>

+<br>

+  add_custom_command(OUTPUT ${header_file} ${cpp_file}<br>

+    COMMAND "${Python3_EXECUTABLE}" ${model_compiler}<br>

+      --model ${model}<br>

+      --output_dir ${output_dir}<br>

+      --filename ${filename}<br>

+      --cpp_class ${cpp_class}<br>

+    COMMENT "Generating code completion model runtime..."<br>

+    DEPENDS ${model_compiler} ${model}/forest.json ${model}/features.json<br>

+    VERBATIM )<br>

+<br>

+  set_source_files_properties(${header_file} PROPERTIES<br>

+    GENERATED 1)<br>

+  set_source_files_properties(${cpp_file} PROPERTIES<br>

+    GENERATED 1)<br>

+<br>

+  # Disable unused label warning for generated files.<br>

+  if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")<br>

+    set_source_files_properties(${cpp_file} PROPERTIES<br>

+      COMPILE_FLAGS /wd4102)<br>

+  else()<br>

+    set_source_files_properties(${cpp_file} PROPERTIES<br>

+      COMPILE_FLAGS -Wno-unused)<br>

+  endif()<br>

+endfunction()<br>

<br>

diff  --git a/clang-tools-extra/clangd/quality/CompletionModelCodegen.py b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>

new file mode 100644<br>

index 000000000000..8f8234f6ebbc<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>

@@ -0,0 +1,283 @@<br>

+"""Code generator for Code Completion Model Inference.<br>

+<br>

+Tool runs on the Decision Forest model defined in {model} directory.<br>

+It generates two files: {output_dir}/{filename}.h and {output_dir}/{filename}.cpp <br>

+The generated files defines the Example class named {cpp_class} having all the features as class members.<br>

+The generated runtime provides an `Evaluate` function which can be used to score a code completion candidate.<br>

+"""<br>

+<br>

+import argparse<br>

+import json<br>

+import struct<br>

+from enum import Enum<br>

+<br>

+<br>

+class CppClass:<br>

+    """Holds class name and names of the enclosing namespaces."""<br>

+<br>

+    def __init__(self, cpp_class):<br>

+        ns_and_class = cpp_class.split("::")<br>

+        self.ns = [ns for ns in ns_and_class[0:-1] if len(ns) > 0]<br>

+        <a href="http://self.name" rel="noreferrer" target="_blank">self.name</a> = ns_and_class[-1]<br>

+        if len(<a href="http://self.name" rel="noreferrer" target="_blank">self.name</a>) == 0:<br>

+            raise ValueError("Empty class name.")<br>

+<br>

+    def ns_begin(self):<br>

+        """Returns snippet for opening namespace declarations."""<br>

+        open_ns = [f"namespace {ns} {{" for ns in self.ns]<br>

+        return "\n".join(open_ns)<br>

+<br>

+    def ns_end(self):<br>

+        """Returns snippet for closing namespace declarations."""<br>

+        close_ns = [<br>

+            f"}} // namespace {ns}" for ns in reversed(self.ns)]<br>

+        return "\n".join(close_ns)<br>

+<br>

+<br>

+def header_guard(filename):<br>

+    '''Returns the header guard for the generated header.'''<br>

+    return f"GENERATED_DECISION_FOREST_MODEL_{filename.upper()}_H"<br>

+<br>

+<br>

+def boost_node(n, label, next_label):<br>

+    """Returns code snippet for a leaf/boost node.<br>

+    Adds value of leaf to the score and jumps to the root of the next tree."""<br>

+    return f"{label}: Score += {n['score']}; goto {next_label};"<br>

+<br>

+<br>

+def if_greater_node(n, label, next_label):<br>

+    """Returns code snippet for a if_greater node.<br>

+    Jumps to true_label if the Example feature (NUMBER) is greater than the threshold. <br>

+    Comparing integers is much faster than comparing floats. Assuming floating points <br>

+    are represented as IEEE 754, it order-encodes the floats to integers before comparing them.<br>

+    Control falls through if condition is evaluated to false."""<br>

+    threshold = n["threshold"]<br>

+    return f"{label}: if (E.{n['feature']} >= {order_encode(threshold)} /*{threshold}*/) goto {next_label};"<br>

+<br>

+<br>

+def if_member_node(n, label, next_label):<br>

+    """Returns code snippet for a if_member node.<br>

+    Jumps to true_label if the Example feature (ENUM) is present in the set of enum values <br>

+    described in the node.<br>

+    Control falls through if condition is evaluated to false."""<br>

+    members = '|'.join([<br>

+        f"BIT({n['feature']}_type::{member})"<br>

+        for member in n["set"]<br>

+    ])<br>

+    return f"{label}: if (E.{n['feature']} & ({members})) goto {next_label};"<br>

+<br>

+<br>

+def node(n, label, next_label):<br>

+    """Returns code snippet for the node."""<br>

+    return {<br>

+        'boost': boost_node,<br>

+        'if_greater': if_greater_node,<br>

+        'if_member': if_member_node,<br>

+    }[n['operation']](n, label, next_label)<br>

+<br>

+<br>

+def tree(t, tree_num: int, node_num: int):<br>

+    """Returns code for inferencing a Decision Tree.<br>

+    Also returns the size of the decision tree.<br>

+<br>

+    A tree starts with its label `t{tree#}`.<br>

+    A node of the tree starts with label `t{tree#}_n{node#}`.<br>

+<br>

+    The tree contains two types of node: Conditional node and Leaf node.<br>

+    -   Conditional node evaluates a condition. If true, it jumps to the true node/child.<br>

+        Code is generated using pre-order traversal of the tree considering<br>

+        false node as the first child. Therefore the false node is always the<br>

+        immediately next label.<br>

+    -   Leaf node adds the value to the score and jumps to the next tree.<br>

+    """<br>

+    label = f"t{tree_num}_n{node_num}"<br>

+    code = []<br>

+    if node_num == 0:<br>

+        code.append(f"t{tree_num}:")<br>

+<br>

+    if t["operation"] == "boost":<br>

+        code.append(node(t, label=label, next_label=f"t{tree_num+1}"))<br>

+        return code, 1<br>

+<br>

+    false_code, false_size = tree(<br>

+        t['else'], tree_num=tree_num, node_num=node_num+1)<br>

+<br>

+    true_node_num = node_num+false_size+1<br>

+    true_label = f"t{tree_num}_n{true_node_num}"<br>

+<br>

+    true_code, true_size = tree(<br>

+        t['then'], tree_num=tree_num, node_num=true_node_num)<br>

+<br>

+    code.append(node(t, label=label, next_label=true_label))<br>

+<br>

+    return code+false_code+true_code, 1+false_size+true_size<br>

+<br>

+<br>

+def gen_header_code(features_json: list, cpp_class, filename: str):<br>

+    """Returns code for header declaring the inference runtime.<br>

+<br>

+    Declares the Example class named {cpp_class} inside relevant namespaces.<br>

+    The Example class contains all the features as class members. This <br>

+    class can be used to represent a code completion candidate.<br>

+    Provides `float Evaluate()` function which can be used to score the Example.<br>

+    """<br>

+    setters = []<br>

+    for f in features_json:<br>

+        feature = f["name"]<br>

+        if f["kind"] == "NUMBER":<br>

+            # Floats are order-encoded to integers for faster comparison.<br>

+            setters.append(<br>

+                f"void set{feature}(float V) {{ {feature} = OrderEncode(V); }}")<br>

+        elif f["kind"] == "ENUM":<br>

+            setters.append(<br>

+                f"void set{feature}(unsigned V) {{ {feature} = 1 << V; }}")<br>

+        else:<br>

+            raise ValueError("Unhandled feature type.", f["kind"])<br>

+<br>

+    # Class members represent all the features of the Example.<br>

+    class_members = [f"uint32_t {f['name']} = 0;" for f in features_json]<br>

+<br>

+    nline = "\n  "<br>

+    guard = header_guard(filename)<br>

+    return f"""#ifndef {guard}<br>

+#define {guard}<br>

+#include <cstdint><br>

+<br>

+{cpp_class.ns_begin()}<br>

+class {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>} {{<br>

+public:<br>

+  {nline.join(setters)}<br>

+<br>

+private:<br>

+  {nline.join(class_members)}<br>

+<br>

+  // Produces an integer that sorts in the same order as F.<br>

+  // That is: a < b <==> orderEncode(a) < orderEncode(b).<br>

+  static uint32_t OrderEncode(float F);<br>

+  friend float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>

+}};<br>

+<br>

+float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>

+{cpp_class.ns_end()}<br>

+#endif // {guard}<br>

+"""<br>

+<br>

+<br>

+def order_encode(v: float):<br>

+    i = struct.unpack('<I', struct.pack('<f', v))[0]<br>

+    TopBit = 1 << 31<br>

+    # IEEE 754 floats compare like sign-magnitude integers.<br>

+    if (i & TopBit):  # Negative float<br>

+        return (1 << 32) - i  # low half of integers, order reversed.<br>

+    return TopBit + i  # top half of integers<br>

+<br>

+<br>

+def evaluate_func(forest_json: list, cpp_class: CppClass):<br>

+    """Generates code for `float Evaluate(const {Example}&)` function.<br>

+    The generated function can be used to score an Example."""<br>

+    code = f"float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}& E) {{\n"<br>

+    lines = []<br>

+    lines.append("float Score = 0;")<br>

+    tree_num = 0<br>

+    for tree_json in forest_json:<br>

+        lines.extend(tree(tree_json, tree_num=tree_num, node_num=0)[0])<br>

+        lines.append("")<br>

+        tree_num += 1<br>

+<br>

+    lines.append(f"t{len(forest_json)}: // No such tree.")<br>

+    lines.append("return Score;")<br>

+    code += "  " + "\n  ".join(lines)<br>

+    code += "\n}"<br>

+    return code<br>

+<br>

+<br>

+def gen_cpp_code(forest_json: list, features_json: list, filename: str,<br>

+                 cpp_class: CppClass):<br>

+    """Generates code for the .cpp file."""<br>

+    # Headers<br>

+    # Required by OrderEncode(float F).<br>

+    angled_include = [<br>

+        f'#include <{h}>'<br>

+        for h in ["cstring", "limits"]<br>

+    ]<br>

+<br>

+    # Include generated header.<br>

+    qouted_headers = {f"{filename}.h", "llvm/ADT/bit.h"}<br>

+    # Headers required by ENUM features used by the model.<br>

+    qouted_headers |= {f["header"]<br>

+                       for f in features_json if f["kind"] == "ENUM"}<br>

+    quoted_include = [f'#include "{h}"' for h in sorted(qouted_headers)]<br>

+<br>

+    # using-decl for ENUM features.<br>

+    using_decls = "\n".join(f"using {feature['name']}_type = {feature['type']};"<br>

+                            for feature in features_json<br>

+                            if feature["kind"] == "ENUM")<br>

+    nl = "\n"<br>

+    return f"""{nl.join(angled_include)}<br>

+<br>

+{nl.join(quoted_include)}<br>

+<br>

+#define BIT(X) (1 << X)<br>

+<br>

+{cpp_class.ns_begin()}<br>

+<br>

+{using_decls}<br>

+<br>

+uint32_t {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}::OrderEncode(float F) {{<br>

+  static_assert(std::numeric_limits<float>::is_iec559, "");<br>

+  constexpr uint32_t TopBit = ~(~uint32_t{{0}} >> 1);<br>

+<br>

+  // Get the bits of the float. Endianness is the same as for integers.<br>

+  uint32_t U = llvm::bit_cast<uint32_t>(F);<br>

+  std::memcpy(&U, &F, sizeof(U));<br>

+  // IEEE 754 floats compare like sign-magnitude integers.<br>

+  if (U & TopBit)    // Negative float.<br>

+    return 0 - U;    // Map onto the low half of integers, order reversed.<br>

+  return U + TopBit; // Positive floats map onto the high half of integers.<br>

+}}<br>

+<br>

+{evaluate_func(forest_json, cpp_class)}<br>

+{cpp_class.ns_end()}<br>

+"""<br>

+<br>

+<br>

+def main():<br>

+    parser = argparse.ArgumentParser('DecisionForestCodegen')<br>

+    parser.add_argument('--filename', help='output file name.')<br>

+    parser.add_argument('--output_dir', help='output directory.')<br>

+    parser.add_argument('--model', help='path to model directory.')<br>

+    parser.add_argument(<br>

+        '--cpp_class',<br>

+        help='The name of the class (which may be a namespace-qualified) created in generated header.'<br>

+    )<br>

+    ns = parser.parse_args()<br>

+<br>

+    output_dir = ns.output_dir<br>

+    filename = ns.filename<br>

+    header_file = f"{output_dir}/{filename}.h"<br>

+    cpp_file = f"{output_dir}/{filename}.cpp"<br>

+    cpp_class = CppClass(cpp_class=ns.cpp_class)<br>

+<br>

+    model_file = f"{ns.model}/forest.json"<br>

+    features_file = f"{ns.model}/features.json"<br>

+<br>

+    with open(features_file) as f:<br>

+        features_json = json.load(f)<br>

+<br>

+    with open(model_file) as m:<br>

+        forest_json = json.load(m)<br>

+<br>

+    with open(cpp_file, 'w+t') as output_cc:<br>

+        output_cc.write(<br>

+            gen_cpp_code(forest_json=forest_json,<br>

+                         features_json=features_json,<br>

+                         filename=filename,<br>

+                         cpp_class=cpp_class))<br>

+<br>

+    with open(header_file, 'w+t') as output_h:<br>

+        output_h.write(gen_header_code(<br>

+            features_json=features_json, cpp_class=cpp_class, filename=filename))<br>

+<br>

+<br>

+if __name__ == '__main__':<br>

+    main()<br>

<br>

diff  --git a/clang-tools-extra/clangd/quality/README.md b/clang-tools-extra/clangd/quality/README.md<br>

new file mode 100644<br>

index 000000000000..36fa37320e54<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/quality/README.md<br>

@@ -0,0 +1,220 @@<br>

+# Decision Forest Code Completion Model<br>

+<br>

+## Decision Forest<br>

+A **decision forest** is a collection of many decision trees. A **decision tree** is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a **binary decision** based on the input data, and leaf nodes represent a prediction.<br>

+<br>

+In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf. <br>

+<br>

+An input (code completion candidate) is characterized as a set of **features**, such as the *type of symbol* or the *number of existing references*.<br>

+<br>

+At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one *feature** of the input against a constant. The condition can be of two types:<br>

+- **if_greater**: Checks whether a numerical feature is **>=** a **threshold**.<br>

+- **if_member**: Check whether the **enum** feature is contained in the **set** defined in the node.<br>

+<br>

+A leaf node contains the value **score**.<br>

+To compute an overall **quality** score, we traverse each tree in this way and add up the scores.<br>

+<br>

+## Model Input Format<br>

+The input model is represented in json format.<br>

+<br>

+### Features<br>

+The file **features.json** defines the features available to the model. <br>

+It is a json list of features. The features can be of following two kinds.<br>

+<br>

+#### Number<br>

+```<br>

+{<br>

+  "name": "a_numerical_feature",<br>

+  "kind": "NUMBER"<br>

+}<br>

+```<br>

+#### Enum<br>

+```<br>

+{<br>

+  "name": "an_enum_feature",<br>

+  "kind": "ENUM",<br>

+  "enum": "fully::qualified::enum",<br>

+  "header": "path/to/HeaderDeclaringEnum.h"<br>

+}<br>

+```<br>

+The field `enum` specifies the fully qualified name of the enum.<br>

+The maximum cardinality of the enum can be **32**.<br>

+<br>

+The field `header` specifies the header containing the declaration of the enum.<br>

+This header is included by the inference runtime.<br>

+<br>

+<br>

+### Decision Forest<br>

+The file `forest.json` defines the  decision forest. It is a json list of **DecisionTree**.<br>

+<br>

+**DecisionTree** is one of **IfGreaterNode**, **IfMemberNode**, **LeafNode**.<br>

+#### IfGreaterNode<br>

+```<br>

+{<br>

+  "operation": "if_greater",<br>

+  "feature": "a_numerical_feature",<br>

+  "threshold": A real number,<br>

+  "then": {A DecisionTree},<br>

+  "else": {A DecisionTree}<br>

+}<br>

+```<br>

+#### IfMemberNode<br>

+```<br>

+{<br>

+  "operation": "if_member",<br>

+  "feature": "an_enum_feature",<br>

+  "set": ["enum_value1", "enum_value2", ...],<br>

+  "then": {A DecisionTree},<br>

+  "else": {A DecisionTree}<br>

+}<br>

+```<br>

+#### LeafNode<br>

+```<br>

+{<br>

+  "operation": "boost",<br>

+  "score": A real number<br>

+}<br>

+```<br>

+<br>

+## Code Generator for Inference<br>

+The implementation of inference runtime is split across:<br>

+<br>

+### Code generator<br>

+The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library: <br>

+- `${output_dir}/{filename}.h`<br>

+- `${output_dir}/{filename}.cpp`<br>

+<br>

+Invocation<br>

+```<br>

+python3 CompletionModelCodegen.py \<br>

+        --model path/to/model/dir \<br>

+        --output_dir path/to/output/dir \<br>

+        --filename OutputFileName \<br>

+        --cpp_class clang::clangd::YourExampleClass<br>

+```<br>

+### Build System<br>

+`CompletionModel.cmake` provides `gen_decision_forest` method . <br>

+Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library.<br>

+It can then use the generated API by including and depending on this library.<br>

+<br>

+### Generated API for inference<br>

+The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`.<br>

+<br>

+Members of this generated class comprises of all the features mentioned in `features.json`. <br>

+Thus this class can represent a code completion candidate that needs to be scored.<br>

+<br>

+The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate.<br>

+<br>

+<br>

+## Example<br>

+### model/features.json<br>

+```<br>

+[<br>

+  {<br>

+    "name": "ANumber",<br>

+    "type": "NUMBER"<br>

+  },<br>

+  {<br>

+    "name": "AFloat",<br>

+    "type": "NUMBER"<br>

+  },<br>

+  {<br>

+    "name": "ACategorical",<br>

+    "type": "ENUM",<br>

+    "enum": "ns1::ns2::TestEnum",<br>

+    "header": "model/CategoricalFeature.h"<br>

+  }<br>

+]<br>

+```<br>

+### model/forest.json<br>

+```<br>

+[<br>

+  {<br>

+    "operation": "if_greater",<br>

+    "feature": "ANumber",<br>

+    "threshold": 200.0,<br>

+    "then": {<br>

+      "operation": "if_greater",<br>

+      "feature": "AFloat",<br>

+      "threshold": -1,<br>

+      "then": {<br>

+        "operation": "boost",<br>

+        "score": 10.0<br>

+      },<br>

+      "else": {<br>

+        "operation": "boost",<br>

+        "score": -20.0<br>

+      }<br>

+    },<br>

+    "else": {<br>

+      "operation": "if_member",<br>

+      "feature": "ACategorical",<br>

+      "set": [<br>

+        "A",<br>

+        "C"<br>

+      ],<br>

+      "then": {<br>

+        "operation": "boost",<br>

+        "score": 3.0<br>

+      },<br>

+      "else": {<br>

+        "operation": "boost",<br>

+        "score": -4.0<br>

+      }<br>

+    }<br>

+  },<br>

+  {<br>

+    "operation": "if_member",<br>

+    "feature": "ACategorical",<br>

+    "set": [<br>

+      "A",<br>

+      "B"<br>

+    ],<br>

+    "then": {<br>

+      "operation": "boost",<br>

+      "score": 5.0<br>

+    },<br>

+    "else": {<br>

+      "operation": "boost",<br>

+      "score": -6.0<br>

+    }<br>

+  }<br>

+]<br>

+```<br>

+### DecisionForestRuntime.h<br>

+```<br>

+...<br>

+namespace ns1 {<br>

+namespace ns2 {<br>

+namespace test {<br>

+class Example {<br>

+public:<br>

+  void setANumber(float V) { ... }<br>

+  void setAFloat(float V) { ... }<br>

+  void setACategorical(unsigned V) { ... }<br>

+<br>

+private:<br>

+  ...<br>

+};<br>

+<br>

+float Evaluate(const Example&);<br>

+} // namespace test<br>

+} // namespace ns2<br>

+} // namespace ns1<br>

+```<br>

+<br>

+### CMake Invocation<br>

+Inorder to use the inference runtime, one can use `gen_decision_forest` function <br>

+described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments.<br>

+<br>

+For example, the following invocation reads the model present in `path/to/model` and creates <br>

+`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp` <br>

+describing a `class` named `MyClass` in namespace `fully::qualified`.<br>

+<br>

+<br>

+<br>

+```<br>

+gen_decision_forest(path/to/model<br>

+  myfilename<br>

+  ::fully::qualifed::MyClass)<br>

+```<br>

\ No newline at end of file<br>

<br>

diff  --git a/clang-tools-extra/clangd/quality/model/features.json b/clang-tools-extra/clangd/quality/model/features.json<br>

new file mode 100644<br>

index 000000000000..e91eccd1ce20<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/quality/model/features.json<br>

@@ -0,0 +1,8 @@<br>

+[<br>

+    {<br>

+        "name": "ContextKind",<br>

+        "kind": "ENUM",<br>

+        "type": "clang::CodeCompletionContext::Kind",<br>

+        "header": "clang/Sema/CodeCompleteConsumer.h"<br>

+    }<br>

+]<br>

\ No newline at end of file<br>

<br>

diff  --git a/clang-tools-extra/clangd/quality/model/forest.json b/clang-tools-extra/clangd/quality/model/forest.json<br>

new file mode 100644<br>

index 000000000000..78a1524e2d81<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/quality/model/forest.json<br>

@@ -0,0 +1,18 @@<br>

+[<br>

+    {<br>

+        "operation": "if_member",<br>

+        "feature": "ContextKind",<br>

+        "set": [<br>

+            "CCC_DotMemberAccess",<br>

+            "CCC_ArrowMemberAccess"<br>

+        ],<br>

+        "then": {<br>

+            "operation": "boost",<br>

+            "score": 3.0<br>

+        },<br>

+        "else": {<br>

+            "operation": "boost",<br>

+            "score": 1.0<br>

+        }<br>

+    }<br>

+]<br>

\ No newline at end of file<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/CMakeLists.txt b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>

index 2167b5e210e2..a84fd0b71ca5 100644<br>

--- a/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>

+++ b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>

@@ -28,6 +28,9 @@ if (CLANGD_ENABLE_REMOTE)<br>

   set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)<br>

 endif()<br>

<br>

+include(${CMAKE_CURRENT_SOURCE_DIR}/../quality/CompletionModel.cmake)<br>

+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/decision_forest_model DecisionForestRuntimeTest ::ns1::ns2::test::Example)<br>

+<br>

 add_custom_target(ClangdUnitTests)<br>

 add_unittest(ClangdUnitTests ClangdTests<br>

   Annotations.cpp<br>

@@ -44,6 +47,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>

   ConfigCompileTests.cpp<br>

   ConfigProviderTests.cpp<br>

   ConfigYAMLTests.cpp<br>

+  DecisionForestTests.cpp<br>

   DexTests.cpp<br>

   DiagnosticsTests.cpp<br>

   DraftStoreTests.cpp<br>

@@ -89,6 +93,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>

   TweakTesting.cpp<br>

   URITests.cpp<br>

   XRefsTests.cpp<br>

+  ${CMAKE_CURRENT_BINARY_DIR}/DecisionForestRuntimeTest.cpp<br>

<br>

   support/CancellationTests.cpp<br>

   support/ContextTests.cpp<br>

@@ -103,6 +108,11 @@ add_unittest(ClangdUnitTests ClangdTests<br>

   $<TARGET_OBJECTS:obj.clangDaemonTweaks><br>

   )<br>

<br>

+# Include generated ComletionModel headers.<br>

+target_include_directories(ClangdTests PUBLIC<br>

+  $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>

+)<br>

+<br>

 clang_target_link_libraries(ClangdTests<br>

   PRIVATE<br>

   clangAST<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>

index 635e036039a0..460976d64f9f 100644<br>

--- a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>

+++ b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>

@@ -10,6 +10,7 @@<br>

 #include "ClangdServer.h"<br>

 #include "CodeComplete.h"<br>

 #include "Compiler.h"<br>

+#include "CompletionModel.h"<br>

 #include "Matchers.h"<br>

 #include "Protocol.h"<br>

 #include "Quality.h"<br>

@@ -47,6 +48,7 @@ using ::testing::HasSubstr;<br>

 using ::testing::IsEmpty;<br>

 using ::testing::Not;<br>

 using ::testing::UnorderedElementsAre;<br>

+using ContextKind = CodeCompletionContext::Kind;<br>

<br>

 // GMock helpers for matching completion items.<br>

 MATCHER_P(Named, Name, "") { return arg.Name == Name; }<br>

@@ -161,6 +163,16 @@ Symbol withReferences(int N, Symbol S) {<br>

   return S;<br>

 }<br>

<br>

+TEST(DecisionForestRuntime, SanityTest) {<br>

+  using Example = clangd::Example;<br>

+  using clangd::Evaluate;<br>

+  Example E1;<br>

+  E1.setContextKind(ContextKind::CCC_ArrowMemberAccess);<br>

+  Example E2;<br>

+  E2.setContextKind(ContextKind::CCC_SymbolOrNewName);<br>

+  EXPECT_GT(Evaluate(E1), Evaluate(E2));<br>

+}<br>

+<br>

 TEST(CompletionTest, Limit) {<br>

   clangd::CodeCompleteOptions Opts;<br>

   Opts.Limit = 2;<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>

new file mode 100644<br>

index 000000000000..d29c8a4a0358<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>

@@ -0,0 +1,29 @@<br>

+#include "DecisionForestRuntimeTest.h"<br>

+#include "decision_forest_model/CategoricalFeature.h"<br>

+#include "gtest/gtest.h"<br>

+<br>

+namespace clang {<br>

+namespace clangd {<br>

+<br>

+TEST(DecisionForestRuntime, Evaluate) {<br>

+  using Example = ::ns1::ns2::test::Example;<br>

+  using Cat = ::ns1::ns2::TestEnum;<br>

+  using ::ns1::ns2::test::Evaluate;<br>

+<br>

+  Example E;<br>

+  E.setANumber(200);         // True<br>

+  E.setAFloat(0);            // True: +10.0<br>

+  E.setACategorical(Cat::A); // True: +5.0<br>

+  EXPECT_EQ(Evaluate(E), 15.0);<br>

+<br>

+  E.setANumber(200);         // True<br>

+  E.setAFloat(-2.5);         // False: -20.0<br>

+  E.setACategorical(Cat::B); // True: +5.0<br>

+  EXPECT_EQ(Evaluate(E), -15.0);<br>

+<br>

+  E.setANumber(100);         // False<br>

+  E.setACategorical(Cat::C); // True: +3.0, False: -6.0<br>

+  EXPECT_EQ(Evaluate(E), -3.0);<br>

+}<br>

+} // namespace clangd<br>

+} // namespace clang<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>

new file mode 100644<br>

index 000000000000..dfb6ab3b199d<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>

@@ -0,0 +1,5 @@<br>

+namespace ns1 {<br>

+namespace ns2 {<br>

+enum TestEnum { A, B, C, D };<br>

+} // namespace ns2<br>

+} // namespace ns1<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/features.json b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>

new file mode 100644<br>

index 000000000000..7f159f192e19<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>

@@ -0,0 +1,16 @@<br>

+[<br>

+    {<br>

+        "name": "ANumber",<br>

+        "kind": "NUMBER"<br>

+    },<br>

+    {<br>

+        "name": "AFloat",<br>

+        "kind": "NUMBER"<br>

+    },<br>

+    {<br>

+        "name": "ACategorical",<br>

+        "kind": "ENUM",<br>

+        "type": "ns1::ns2::TestEnum",<br>

+        "header": "decision_forest_model/CategoricalFeature.h"<br>

+    }<br>

+]<br>

\ No newline at end of file<br>

<br>

diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>

new file mode 100644<br>

index 000000000000..26f071da485d<br>

--- /dev/null<br>

+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>

@@ -0,0 +1,52 @@<br>

+[<br>

+    {<br>

+        "operation": "if_greater",<br>

+        "feature": "ANumber",<br>

+        "threshold": 200.0,<br>

+        "then": {<br>

+            "operation": "if_greater",<br>

+            "feature": "AFloat",<br>

+            "threshold": -1,<br>

+            "then": {<br>

+                "operation": "boost",<br>

+                "score": 10.0<br>

+            },<br>

+            "else": {<br>

+                "operation": "boost",<br>

+                "score": -20.0<br>

+            }<br>

+        },<br>

+        "else": {<br>

+            "operation": "if_member",<br>

+            "feature": "ACategorical",<br>

+            "set": [<br>

+                "A",<br>

+                "C"<br>

+            ],<br>

+            "then": {<br>

+                "operation": "boost",<br>

+                "score": 3.0<br>

+            },<br>

+            "else": {<br>

+                "operation": "boost",<br>

+                "score": -4.0<br>

+            }<br>

+        }<br>

+    },<br>

+    {<br>

+        "operation": "if_member",<br>

+        "feature": "ACategorical",<br>

+        "set": [<br>

+            "A",<br>

+            "B"<br>

+        ],<br>

+        "then": {<br>

+            "operation": "boost",<br>

+            "score": 5.0<br>

+        },<br>

+        "else": {<br>

+            "operation": "boost",<br>

+            "score": -6.0<br>

+        }<br>

+    }<br>

+]<br>

\ No newline at end of file<br>

<br>

<br>

<br>

_______________________________________________<br>

llvm-branch-commits mailing list<br>

<a href="mailto:llvm-branch-commits@lists.llvm.org" target="_blank">llvm-branch-commits@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits</a><br>

</blockquote></div>