<div dir="ltr">Hi Utkarsh,<div><br></div><div>I've temporarily reverted this here:</div><div><br></div><div>echristo@athyra ~/s/llvm-project (master)> git push<br>To github.com:llvm/llvm-project.git<br>   1f0b43638ed..549e55b3d56  master -> master<br></div><div><br></div><div>the decision forest header file referenced in the unittest doesn't appear to have made it into the commit?</div><div><br></div><div>Thanks and feel free to follow up if I've missed something.</div><div><br></div><div>-eric</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 18, 2020 at 12:38 PM Utkarsh Saxena via llvm-branch-commits <<a href="mailto:llvm-branch-commits@lists.llvm.org">llvm-branch-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Author: Utkarsh Saxena<br>
Date: 2020-09-18T18:27:42+02:00<br>
New Revision: 85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a<br>
<br>
URL: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a</a><br>
DIFF: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff</a><br>
<br>
LOG: [clangd] Add Random Forest runtime for code completion.<br>
<br>
Summary:<br>
[WIP]<br>
- Proposes a json format for representing Random Forest model.<br>
- Proposes a way to test the generated runtime using a test model.<br>
<br>
TODO:<br>
- Add generated source code snippet for easier review.<br>
- Fix unused label warning.<br>
- Figure out required using declarations for CATEGORICAL columns from Features.json.<br>
- Necessary Google3 internal modifications for blaze before landing.<br>
- Add documentation for format of the model.<br>
- Document more.<br>
<br>
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits<br>
<br>
Tags: #clang<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D83814" rel="noreferrer" target="_blank">https://reviews.llvm.org/D83814</a><br>
<br>
Added: <br>
    clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
    clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
    clang-tools-extra/clangd/quality/README.md<br>
    clang-tools-extra/clangd/quality/model/features.json<br>
    clang-tools-extra/clangd/quality/model/forest.json<br>
    clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
    clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
    clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
    clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
<br>
Modified: <br>
    clang-tools-extra/clangd/CMakeLists.txt<br>
    clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
    clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
<br>
Removed: <br>
<br>
<br>
<br>
################################################################################<br>
diff  --git a/clang-tools-extra/clangd/CMakeLists.txt b/clang-tools-extra/clangd/CMakeLists.txt<br>
index 3a1a034ed17b..9d2ab5be222a 100644<br>
--- a/clang-tools-extra/clangd/CMakeLists.txt<br>
+++ b/clang-tools-extra/clangd/CMakeLists.txt<br>
@@ -28,6 +28,9 @@ set(LLVM_LINK_COMPONENTS<br>
   FrontendOpenMP<br>
   Option<br>
   )<br>
+  <br>
+include(${CMAKE_CURRENT_SOURCE_DIR}/quality/CompletionModel.cmake)<br>
+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/quality/model CompletionModel clang::clangd::Example)<br>
<br>
 if(MSVC AND NOT CLANG_CL)<br>
  set_source_files_properties(CompileCommands.cpp PROPERTIES COMPILE_FLAGS -wd4130) # disables C4130: logical operation on address of string constant<br>
@@ -77,6 +80,7 @@ add_clang_library(clangDaemon<br>
   TUScheduler.cpp<br>
   URI.cpp<br>
   XRefs.cpp<br>
+  ${CMAKE_CURRENT_BINARY_DIR}/CompletionModel.cpp<br>
<br>
   index/Background.cpp<br>
   index/BackgroundIndexLoader.cpp<br>
@@ -117,6 +121,11 @@ add_clang_library(clangDaemon<br>
   omp_gen<br>
   )<br>
<br>
+# Include generated CompletionModel headers.<br>
+target_include_directories(clangDaemon PUBLIC<br>
+  $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>
+)<br>
+<br>
 clang_target_link_libraries(clangDaemon<br>
   PRIVATE<br>
   clangAST<br>
<br>
diff  --git a/clang-tools-extra/clangd/quality/CompletionModel.cmake b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
new file mode 100644<br>
index 000000000000..60c6d2aa8433<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
@@ -0,0 +1,37 @@<br>
+# Run the Completion Model Codegenerator on the model present in the <br>
+# ${model} directory.<br>
+# Produces a pair of files called ${filename}.h and  ${filename}.cpp in the <br>
+# ${CMAKE_CURRENT_BINARY_DIR}. The generated header<br>
+# will define a C++ class called ${cpp_class} - which may be a<br>
+# namespace-qualified class name.<br>
+function(gen_decision_forest model filename cpp_class)<br>
+  set(model_compiler ${CMAKE_SOURCE_DIR}/../clang-tools-extra/clangd/quality/CompletionModelCodegen.py)<br>
+  <br>
+  set(output_dir ${CMAKE_CURRENT_BINARY_DIR})<br>
+  set(header_file ${output_dir}/${filename}.h)<br>
+  set(cpp_file ${output_dir}/${filename}.cpp)<br>
+<br>
+  add_custom_command(OUTPUT ${header_file} ${cpp_file}<br>
+    COMMAND "${Python3_EXECUTABLE}" ${model_compiler}<br>
+      --model ${model}<br>
+      --output_dir ${output_dir}<br>
+      --filename ${filename}<br>
+      --cpp_class ${cpp_class}<br>
+    COMMENT "Generating code completion model runtime..."<br>
+    DEPENDS ${model_compiler} ${model}/forest.json ${model}/features.json<br>
+    VERBATIM )<br>
+<br>
+  set_source_files_properties(${header_file} PROPERTIES<br>
+    GENERATED 1)<br>
+  set_source_files_properties(${cpp_file} PROPERTIES<br>
+    GENERATED 1)<br>
+<br>
+  # Disable unused label warning for generated files.<br>
+  if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")<br>
+    set_source_files_properties(${cpp_file} PROPERTIES<br>
+      COMPILE_FLAGS /wd4102)<br>
+  else()<br>
+    set_source_files_properties(${cpp_file} PROPERTIES<br>
+      COMPILE_FLAGS -Wno-unused)<br>
+  endif()<br>
+endfunction()<br>
<br>
diff  --git a/clang-tools-extra/clangd/quality/CompletionModelCodegen.py b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
new file mode 100644<br>
index 000000000000..8f8234f6ebbc<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
@@ -0,0 +1,283 @@<br>
+"""Code generator for Code Completion Model Inference.<br>
+<br>
+Tool runs on the Decision Forest model defined in {model} directory.<br>
+It generates two files: {output_dir}/{filename}.h and {output_dir}/{filename}.cpp <br>
+The generated files defines the Example class named {cpp_class} having all the features as class members.<br>
+The generated runtime provides an `Evaluate` function which can be used to score a code completion candidate.<br>
+"""<br>
+<br>
+import argparse<br>
+import json<br>
+import struct<br>
+from enum import Enum<br>
+<br>
+<br>
+class CppClass:<br>
+    """Holds class name and names of the enclosing namespaces."""<br>
+<br>
+    def __init__(self, cpp_class):<br>
+        ns_and_class = cpp_class.split("::")<br>
+        self.ns = [ns for ns in ns_and_class[0:-1] if len(ns) > 0]<br>
+        <a href="http://self.name" rel="noreferrer" target="_blank">self.name</a> = ns_and_class[-1]<br>
+        if len(<a href="http://self.name" rel="noreferrer" target="_blank">self.name</a>) == 0:<br>
+            raise ValueError("Empty class name.")<br>
+<br>
+    def ns_begin(self):<br>
+        """Returns snippet for opening namespace declarations."""<br>
+        open_ns = [f"namespace {ns} {{" for ns in self.ns]<br>
+        return "\n".join(open_ns)<br>
+<br>
+    def ns_end(self):<br>
+        """Returns snippet for closing namespace declarations."""<br>
+        close_ns = [<br>
+            f"}} // namespace {ns}" for ns in reversed(self.ns)]<br>
+        return "\n".join(close_ns)<br>
+<br>
+<br>
+def header_guard(filename):<br>
+    '''Returns the header guard for the generated header.'''<br>
+    return f"GENERATED_DECISION_FOREST_MODEL_{filename.upper()}_H"<br>
+<br>
+<br>
+def boost_node(n, label, next_label):<br>
+    """Returns code snippet for a leaf/boost node.<br>
+    Adds value of leaf to the score and jumps to the root of the next tree."""<br>
+    return f"{label}: Score += {n['score']}; goto {next_label};"<br>
+<br>
+<br>
+def if_greater_node(n, label, next_label):<br>
+    """Returns code snippet for a if_greater node.<br>
+    Jumps to true_label if the Example feature (NUMBER) is greater than the threshold. <br>
+    Comparing integers is much faster than comparing floats. Assuming floating points <br>
+    are represented as IEEE 754, it order-encodes the floats to integers before comparing them.<br>
+    Control falls through if condition is evaluated to false."""<br>
+    threshold = n["threshold"]<br>
+    return f"{label}: if (E.{n['feature']} >= {order_encode(threshold)} /*{threshold}*/) goto {next_label};"<br>
+<br>
+<br>
+def if_member_node(n, label, next_label):<br>
+    """Returns code snippet for a if_member node.<br>
+    Jumps to true_label if the Example feature (ENUM) is present in the set of enum values <br>
+    described in the node.<br>
+    Control falls through if condition is evaluated to false."""<br>
+    members = '|'.join([<br>
+        f"BIT({n['feature']}_type::{member})"<br>
+        for member in n["set"]<br>
+    ])<br>
+    return f"{label}: if (E.{n['feature']} & ({members})) goto {next_label};"<br>
+<br>
+<br>
+def node(n, label, next_label):<br>
+    """Returns code snippet for the node."""<br>
+    return {<br>
+        'boost': boost_node,<br>
+        'if_greater': if_greater_node,<br>
+        'if_member': if_member_node,<br>
+    }[n['operation']](n, label, next_label)<br>
+<br>
+<br>
+def tree(t, tree_num: int, node_num: int):<br>
+    """Returns code for inferencing a Decision Tree.<br>
+    Also returns the size of the decision tree.<br>
+<br>
+    A tree starts with its label `t{tree#}`.<br>
+    A node of the tree starts with label `t{tree#}_n{node#}`.<br>
+<br>
+    The tree contains two types of node: Conditional node and Leaf node.<br>
+    -   Conditional node evaluates a condition. If true, it jumps to the true node/child.<br>
+        Code is generated using pre-order traversal of the tree considering<br>
+        false node as the first child. Therefore the false node is always the<br>
+        immediately next label.<br>
+    -   Leaf node adds the value to the score and jumps to the next tree.<br>
+    """<br>
+    label = f"t{tree_num}_n{node_num}"<br>
+    code = []<br>
+    if node_num == 0:<br>
+        code.append(f"t{tree_num}:")<br>
+<br>
+    if t["operation"] == "boost":<br>
+        code.append(node(t, label=label, next_label=f"t{tree_num+1}"))<br>
+        return code, 1<br>
+<br>
+    false_code, false_size = tree(<br>
+        t['else'], tree_num=tree_num, node_num=node_num+1)<br>
+<br>
+    true_node_num = node_num+false_size+1<br>
+    true_label = f"t{tree_num}_n{true_node_num}"<br>
+<br>
+    true_code, true_size = tree(<br>
+        t['then'], tree_num=tree_num, node_num=true_node_num)<br>
+<br>
+    code.append(node(t, label=label, next_label=true_label))<br>
+<br>
+    return code+false_code+true_code, 1+false_size+true_size<br>
+<br>
+<br>
+def gen_header_code(features_json: list, cpp_class, filename: str):<br>
+    """Returns code for header declaring the inference runtime.<br>
+<br>
+    Declares the Example class named {cpp_class} inside relevant namespaces.<br>
+    The Example class contains all the features as class members. This <br>
+    class can be used to represent a code completion candidate.<br>
+    Provides `float Evaluate()` function which can be used to score the Example.<br>
+    """<br>
+    setters = []<br>
+    for f in features_json:<br>
+        feature = f["name"]<br>
+        if f["kind"] == "NUMBER":<br>
+            # Floats are order-encoded to integers for faster comparison.<br>
+            setters.append(<br>
+                f"void set{feature}(float V) {{ {feature} = OrderEncode(V); }}")<br>
+        elif f["kind"] == "ENUM":<br>
+            setters.append(<br>
+                f"void set{feature}(unsigned V) {{ {feature} = 1 << V; }}")<br>
+        else:<br>
+            raise ValueError("Unhandled feature type.", f["kind"])<br>
+<br>
+    # Class members represent all the features of the Example.<br>
+    class_members = [f"uint32_t {f['name']} = 0;" for f in features_json]<br>
+<br>
+    nline = "\n  "<br>
+    guard = header_guard(filename)<br>
+    return f"""#ifndef {guard}<br>
+#define {guard}<br>
+#include <cstdint><br>
+<br>
+{cpp_class.ns_begin()}<br>
+class {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>} {{<br>
+public:<br>
+  {nline.join(setters)}<br>
+<br>
+private:<br>
+  {nline.join(class_members)}<br>
+<br>
+  // Produces an integer that sorts in the same order as F.<br>
+  // That is: a < b <==> orderEncode(a) < orderEncode(b).<br>
+  static uint32_t OrderEncode(float F);<br>
+  friend float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>
+}};<br>
+<br>
+float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>
+{cpp_class.ns_end()}<br>
+#endif // {guard}<br>
+"""<br>
+<br>
+<br>
+def order_encode(v: float):<br>
+    i = struct.unpack('<I', struct.pack('<f', v))[0]<br>
+    TopBit = 1 << 31<br>
+    # IEEE 754 floats compare like sign-magnitude integers.<br>
+    if (i & TopBit):  # Negative float<br>
+        return (1 << 32) - i  # low half of integers, order reversed.<br>
+    return TopBit + i  # top half of integers<br>
+<br>
+<br>
+def evaluate_func(forest_json: list, cpp_class: CppClass):<br>
+    """Generates code for `float Evaluate(const {Example}&)` function.<br>
+    The generated function can be used to score an Example."""<br>
+    code = f"float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}& E) {{\n"<br>
+    lines = []<br>
+    lines.append("float Score = 0;")<br>
+    tree_num = 0<br>
+    for tree_json in forest_json:<br>
+        lines.extend(tree(tree_json, tree_num=tree_num, node_num=0)[0])<br>
+        lines.append("")<br>
+        tree_num += 1<br>
+<br>
+    lines.append(f"t{len(forest_json)}: // No such tree.")<br>
+    lines.append("return Score;")<br>
+    code += "  " + "\n  ".join(lines)<br>
+    code += "\n}"<br>
+    return code<br>
+<br>
+<br>
+def gen_cpp_code(forest_json: list, features_json: list, filename: str,<br>
+                 cpp_class: CppClass):<br>
+    """Generates code for the .cpp file."""<br>
+    # Headers<br>
+    # Required by OrderEncode(float F).<br>
+    angled_include = [<br>
+        f'#include <{h}>'<br>
+        for h in ["cstring", "limits"]<br>
+    ]<br>
+<br>
+    # Include generated header.<br>
+    qouted_headers = {f"{filename}.h", "llvm/ADT/bit.h"}<br>
+    # Headers required by ENUM features used by the model.<br>
+    qouted_headers |= {f["header"]<br>
+                       for f in features_json if f["kind"] == "ENUM"}<br>
+    quoted_include = [f'#include "{h}"' for h in sorted(qouted_headers)]<br>
+<br>
+    # using-decl for ENUM features.<br>
+    using_decls = "\n".join(f"using {feature['name']}_type = {feature['type']};"<br>
+                            for feature in features_json<br>
+                            if feature["kind"] == "ENUM")<br>
+    nl = "\n"<br>
+    return f"""{nl.join(angled_include)}<br>
+<br>
+{nl.join(quoted_include)}<br>
+<br>
+#define BIT(X) (1 << X)<br>
+<br>
+{cpp_class.ns_begin()}<br>
+<br>
+{using_decls}<br>
+<br>
+uint32_t {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}::OrderEncode(float F) {{<br>
+  static_assert(std::numeric_limits<float>::is_iec559, "");<br>
+  constexpr uint32_t TopBit = ~(~uint32_t{{0}} >> 1);<br>
+<br>
+  // Get the bits of the float. Endianness is the same as for integers.<br>
+  uint32_t U = llvm::bit_cast<uint32_t>(F);<br>
+  std::memcpy(&U, &F, sizeof(U));<br>
+  // IEEE 754 floats compare like sign-magnitude integers.<br>
+  if (U & TopBit)    // Negative float.<br>
+    return 0 - U;    // Map onto the low half of integers, order reversed.<br>
+  return U + TopBit; // Positive floats map onto the high half of integers.<br>
+}}<br>
+<br>
+{evaluate_func(forest_json, cpp_class)}<br>
+{cpp_class.ns_end()}<br>
+"""<br>
+<br>
+<br>
+def main():<br>
+    parser = argparse.ArgumentParser('DecisionForestCodegen')<br>
+    parser.add_argument('--filename', help='output file name.')<br>
+    parser.add_argument('--output_dir', help='output directory.')<br>
+    parser.add_argument('--model', help='path to model directory.')<br>
+    parser.add_argument(<br>
+        '--cpp_class',<br>
+        help='The name of the class (which may be a namespace-qualified) created in generated header.'<br>
+    )<br>
+    ns = parser.parse_args()<br>
+<br>
+    output_dir = ns.output_dir<br>
+    filename = ns.filename<br>
+    header_file = f"{output_dir}/{filename}.h"<br>
+    cpp_file = f"{output_dir}/{filename}.cpp"<br>
+    cpp_class = CppClass(cpp_class=ns.cpp_class)<br>
+<br>
+    model_file = f"{ns.model}/forest.json"<br>
+    features_file = f"{ns.model}/features.json"<br>
+<br>
+    with open(features_file) as f:<br>
+        features_json = json.load(f)<br>
+<br>
+    with open(model_file) as m:<br>
+        forest_json = json.load(m)<br>
+<br>
+    with open(cpp_file, 'w+t') as output_cc:<br>
+        output_cc.write(<br>
+            gen_cpp_code(forest_json=forest_json,<br>
+                         features_json=features_json,<br>
+                         filename=filename,<br>
+                         cpp_class=cpp_class))<br>
+<br>
+    with open(header_file, 'w+t') as output_h:<br>
+        output_h.write(gen_header_code(<br>
+            features_json=features_json, cpp_class=cpp_class, filename=filename))<br>
+<br>
+<br>
+if __name__ == '__main__':<br>
+    main()<br>
<br>
diff  --git a/clang-tools-extra/clangd/quality/README.md b/clang-tools-extra/clangd/quality/README.md<br>
new file mode 100644<br>
index 000000000000..36fa37320e54<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/README.md<br>
@@ -0,0 +1,220 @@<br>
+# Decision Forest Code Completion Model<br>
+<br>
+## Decision Forest<br>
+A **decision forest** is a collection of many decision trees. A **decision tree** is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a **binary decision** based on the input data, and leaf nodes represent a prediction.<br>
+<br>
+In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf. <br>
+<br>
+An input (code completion candidate) is characterized as a set of **features**, such as the *type of symbol* or the *number of existing references*.<br>
+<br>
+At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one *feature** of the input against a constant. The condition can be of two types:<br>
+- **if_greater**: Checks whether a numerical feature is **>=** a **threshold**.<br>
+- **if_member**: Check whether the **enum** feature is contained in the **set** defined in the node.<br>
+<br>
+A leaf node contains the value **score**.<br>
+To compute an overall **quality** score, we traverse each tree in this way and add up the scores.<br>
+<br>
+## Model Input Format<br>
+The input model is represented in json format.<br>
+<br>
+### Features<br>
+The file **features.json** defines the features available to the model. <br>
+It is a json list of features. The features can be of following two kinds.<br>
+<br>
+#### Number<br>
+```<br>
+{<br>
+  "name": "a_numerical_feature",<br>
+  "kind": "NUMBER"<br>
+}<br>
+```<br>
+#### Enum<br>
+```<br>
+{<br>
+  "name": "an_enum_feature",<br>
+  "kind": "ENUM",<br>
+  "enum": "fully::qualified::enum",<br>
+  "header": "path/to/HeaderDeclaringEnum.h"<br>
+}<br>
+```<br>
+The field `enum` specifies the fully qualified name of the enum.<br>
+The maximum cardinality of the enum can be **32**.<br>
+<br>
+The field `header` specifies the header containing the declaration of the enum.<br>
+This header is included by the inference runtime.<br>
+<br>
+<br>
+### Decision Forest<br>
+The file `forest.json` defines the  decision forest. It is a json list of **DecisionTree**.<br>
+<br>
+**DecisionTree** is one of **IfGreaterNode**, **IfMemberNode**, **LeafNode**.<br>
+#### IfGreaterNode<br>
+```<br>
+{<br>
+  "operation": "if_greater",<br>
+  "feature": "a_numerical_feature",<br>
+  "threshold": A real number,<br>
+  "then": {A DecisionTree},<br>
+  "else": {A DecisionTree}<br>
+}<br>
+```<br>
+#### IfMemberNode<br>
+```<br>
+{<br>
+  "operation": "if_member",<br>
+  "feature": "an_enum_feature",<br>
+  "set": ["enum_value1", "enum_value2", ...],<br>
+  "then": {A DecisionTree},<br>
+  "else": {A DecisionTree}<br>
+}<br>
+```<br>
+#### LeafNode<br>
+```<br>
+{<br>
+  "operation": "boost",<br>
+  "score": A real number<br>
+}<br>
+```<br>
+<br>
+## Code Generator for Inference<br>
+The implementation of inference runtime is split across:<br>
+<br>
+### Code generator<br>
+The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library: <br>
+- `${output_dir}/{filename}.h`<br>
+- `${output_dir}/{filename}.cpp`<br>
+<br>
+Invocation<br>
+```<br>
+python3 CompletionModelCodegen.py \<br>
+        --model path/to/model/dir \<br>
+        --output_dir path/to/output/dir \<br>
+        --filename OutputFileName \<br>
+        --cpp_class clang::clangd::YourExampleClass<br>
+```<br>
+### Build System<br>
+`CompletionModel.cmake` provides `gen_decision_forest` method . <br>
+Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library.<br>
+It can then use the generated API by including and depending on this library.<br>
+<br>
+### Generated API for inference<br>
+The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`.<br>
+<br>
+Members of this generated class comprises of all the features mentioned in `features.json`. <br>
+Thus this class can represent a code completion candidate that needs to be scored.<br>
+<br>
+The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate.<br>
+<br>
+<br>
+## Example<br>
+### model/features.json<br>
+```<br>
+[<br>
+  {<br>
+    "name": "ANumber",<br>
+    "type": "NUMBER"<br>
+  },<br>
+  {<br>
+    "name": "AFloat",<br>
+    "type": "NUMBER"<br>
+  },<br>
+  {<br>
+    "name": "ACategorical",<br>
+    "type": "ENUM",<br>
+    "enum": "ns1::ns2::TestEnum",<br>
+    "header": "model/CategoricalFeature.h"<br>
+  }<br>
+]<br>
+```<br>
+### model/forest.json<br>
+```<br>
+[<br>
+  {<br>
+    "operation": "if_greater",<br>
+    "feature": "ANumber",<br>
+    "threshold": 200.0,<br>
+    "then": {<br>
+      "operation": "if_greater",<br>
+      "feature": "AFloat",<br>
+      "threshold": -1,<br>
+      "then": {<br>
+        "operation": "boost",<br>
+        "score": 10.0<br>
+      },<br>
+      "else": {<br>
+        "operation": "boost",<br>
+        "score": -20.0<br>
+      }<br>
+    },<br>
+    "else": {<br>
+      "operation": "if_member",<br>
+      "feature": "ACategorical",<br>
+      "set": [<br>
+        "A",<br>
+        "C"<br>
+      ],<br>
+      "then": {<br>
+        "operation": "boost",<br>
+        "score": 3.0<br>
+      },<br>
+      "else": {<br>
+        "operation": "boost",<br>
+        "score": -4.0<br>
+      }<br>
+    }<br>
+  },<br>
+  {<br>
+    "operation": "if_member",<br>
+    "feature": "ACategorical",<br>
+    "set": [<br>
+      "A",<br>
+      "B"<br>
+    ],<br>
+    "then": {<br>
+      "operation": "boost",<br>
+      "score": 5.0<br>
+    },<br>
+    "else": {<br>
+      "operation": "boost",<br>
+      "score": -6.0<br>
+    }<br>
+  }<br>
+]<br>
+```<br>
+### DecisionForestRuntime.h<br>
+```<br>
+...<br>
+namespace ns1 {<br>
+namespace ns2 {<br>
+namespace test {<br>
+class Example {<br>
+public:<br>
+  void setANumber(float V) { ... }<br>
+  void setAFloat(float V) { ... }<br>
+  void setACategorical(unsigned V) { ... }<br>
+<br>
+private:<br>
+  ...<br>
+};<br>
+<br>
+float Evaluate(const Example&);<br>
+} // namespace test<br>
+} // namespace ns2<br>
+} // namespace ns1<br>
+```<br>
+<br>
+### CMake Invocation<br>
+Inorder to use the inference runtime, one can use `gen_decision_forest` function <br>
+described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments.<br>
+<br>
+For example, the following invocation reads the model present in `path/to/model` and creates <br>
+`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp` <br>
+describing a `class` named `MyClass` in namespace `fully::qualified`.<br>
+<br>
+<br>
+<br>
+```<br>
+gen_decision_forest(path/to/model<br>
+  myfilename<br>
+  ::fully::qualifed::MyClass)<br>
+```<br>
\ No newline at end of file<br>
<br>
diff  --git a/clang-tools-extra/clangd/quality/model/features.json b/clang-tools-extra/clangd/quality/model/features.json<br>
new file mode 100644<br>
index 000000000000..e91eccd1ce20<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/model/features.json<br>
@@ -0,0 +1,8 @@<br>
+[<br>
+    {<br>
+        "name": "ContextKind",<br>
+        "kind": "ENUM",<br>
+        "type": "clang::CodeCompletionContext::Kind",<br>
+        "header": "clang/Sema/CodeCompleteConsumer.h"<br>
+    }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff  --git a/clang-tools-extra/clangd/quality/model/forest.json b/clang-tools-extra/clangd/quality/model/forest.json<br>
new file mode 100644<br>
index 000000000000..78a1524e2d81<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/model/forest.json<br>
@@ -0,0 +1,18 @@<br>
+[<br>
+    {<br>
+        "operation": "if_member",<br>
+        "feature": "ContextKind",<br>
+        "set": [<br>
+            "CCC_DotMemberAccess",<br>
+            "CCC_ArrowMemberAccess"<br>
+        ],<br>
+        "then": {<br>
+            "operation": "boost",<br>
+            "score": 3.0<br>
+        },<br>
+        "else": {<br>
+            "operation": "boost",<br>
+            "score": 1.0<br>
+        }<br>
+    }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/CMakeLists.txt b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
index 2167b5e210e2..a84fd0b71ca5 100644<br>
--- a/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
+++ b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
@@ -28,6 +28,9 @@ if (CLANGD_ENABLE_REMOTE)<br>
   set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)<br>
 endif()<br>
<br>
+include(${CMAKE_CURRENT_SOURCE_DIR}/../quality/CompletionModel.cmake)<br>
+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/decision_forest_model DecisionForestRuntimeTest ::ns1::ns2::test::Example)<br>
+<br>
 add_custom_target(ClangdUnitTests)<br>
 add_unittest(ClangdUnitTests ClangdTests<br>
   Annotations.cpp<br>
@@ -44,6 +47,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>
   ConfigCompileTests.cpp<br>
   ConfigProviderTests.cpp<br>
   ConfigYAMLTests.cpp<br>
+  DecisionForestTests.cpp<br>
   DexTests.cpp<br>
   DiagnosticsTests.cpp<br>
   DraftStoreTests.cpp<br>
@@ -89,6 +93,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>
   TweakTesting.cpp<br>
   URITests.cpp<br>
   XRefsTests.cpp<br>
+  ${CMAKE_CURRENT_BINARY_DIR}/DecisionForestRuntimeTest.cpp<br>
<br>
   support/CancellationTests.cpp<br>
   support/ContextTests.cpp<br>
@@ -103,6 +108,11 @@ add_unittest(ClangdUnitTests ClangdTests<br>
   $<TARGET_OBJECTS:obj.clangDaemonTweaks><br>
   )<br>
<br>
+# Include generated ComletionModel headers.<br>
+target_include_directories(ClangdTests PUBLIC<br>
+  $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>
+)<br>
+<br>
 clang_target_link_libraries(ClangdTests<br>
   PRIVATE<br>
   clangAST<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
index 635e036039a0..460976d64f9f 100644<br>
--- a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
+++ b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
@@ -10,6 +10,7 @@<br>
 #include "ClangdServer.h"<br>
 #include "CodeComplete.h"<br>
 #include "Compiler.h"<br>
+#include "CompletionModel.h"<br>
 #include "Matchers.h"<br>
 #include "Protocol.h"<br>
 #include "Quality.h"<br>
@@ -47,6 +48,7 @@ using ::testing::HasSubstr;<br>
 using ::testing::IsEmpty;<br>
 using ::testing::Not;<br>
 using ::testing::UnorderedElementsAre;<br>
+using ContextKind = CodeCompletionContext::Kind;<br>
<br>
 // GMock helpers for matching completion items.<br>
 MATCHER_P(Named, Name, "") { return arg.Name == Name; }<br>
@@ -161,6 +163,16 @@ Symbol withReferences(int N, Symbol S) {<br>
   return S;<br>
 }<br>
<br>
+TEST(DecisionForestRuntime, SanityTest) {<br>
+  using Example = clangd::Example;<br>
+  using clangd::Evaluate;<br>
+  Example E1;<br>
+  E1.setContextKind(ContextKind::CCC_ArrowMemberAccess);<br>
+  Example E2;<br>
+  E2.setContextKind(ContextKind::CCC_SymbolOrNewName);<br>
+  EXPECT_GT(Evaluate(E1), Evaluate(E2));<br>
+}<br>
+<br>
 TEST(CompletionTest, Limit) {<br>
   clangd::CodeCompleteOptions Opts;<br>
   Opts.Limit = 2;<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
new file mode 100644<br>
index 000000000000..d29c8a4a0358<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
@@ -0,0 +1,29 @@<br>
+#include "DecisionForestRuntimeTest.h"<br>
+#include "decision_forest_model/CategoricalFeature.h"<br>
+#include "gtest/gtest.h"<br>
+<br>
+namespace clang {<br>
+namespace clangd {<br>
+<br>
+TEST(DecisionForestRuntime, Evaluate) {<br>
+  using Example = ::ns1::ns2::test::Example;<br>
+  using Cat = ::ns1::ns2::TestEnum;<br>
+  using ::ns1::ns2::test::Evaluate;<br>
+<br>
+  Example E;<br>
+  E.setANumber(200);         // True<br>
+  E.setAFloat(0);            // True: +10.0<br>
+  E.setACategorical(Cat::A); // True: +5.0<br>
+  EXPECT_EQ(Evaluate(E), 15.0);<br>
+<br>
+  E.setANumber(200);         // True<br>
+  E.setAFloat(-2.5);         // False: -20.0<br>
+  E.setACategorical(Cat::B); // True: +5.0<br>
+  EXPECT_EQ(Evaluate(E), -15.0);<br>
+<br>
+  E.setANumber(100);         // False<br>
+  E.setACategorical(Cat::C); // True: +3.0, False: -6.0<br>
+  EXPECT_EQ(Evaluate(E), -3.0);<br>
+}<br>
+} // namespace clangd<br>
+} // namespace clang<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
new file mode 100644<br>
index 000000000000..dfb6ab3b199d<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
@@ -0,0 +1,5 @@<br>
+namespace ns1 {<br>
+namespace ns2 {<br>
+enum TestEnum { A, B, C, D };<br>
+} // namespace ns2<br>
+} // namespace ns1<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/features.json b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
new file mode 100644<br>
index 000000000000..7f159f192e19<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
@@ -0,0 +1,16 @@<br>
+[<br>
+    {<br>
+        "name": "ANumber",<br>
+        "kind": "NUMBER"<br>
+    },<br>
+    {<br>
+        "name": "AFloat",<br>
+        "kind": "NUMBER"<br>
+    },<br>
+    {<br>
+        "name": "ACategorical",<br>
+        "kind": "ENUM",<br>
+        "type": "ns1::ns2::TestEnum",<br>
+        "header": "decision_forest_model/CategoricalFeature.h"<br>
+    }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff  --git a/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
new file mode 100644<br>
index 000000000000..26f071da485d<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
@@ -0,0 +1,52 @@<br>
+[<br>
+    {<br>
+        "operation": "if_greater",<br>
+        "feature": "ANumber",<br>
+        "threshold": 200.0,<br>
+        "then": {<br>
+            "operation": "if_greater",<br>
+            "feature": "AFloat",<br>
+            "threshold": -1,<br>
+            "then": {<br>
+                "operation": "boost",<br>
+                "score": 10.0<br>
+            },<br>
+            "else": {<br>
+                "operation": "boost",<br>
+                "score": -20.0<br>
+            }<br>
+        },<br>
+        "else": {<br>
+            "operation": "if_member",<br>
+            "feature": "ACategorical",<br>
+            "set": [<br>
+                "A",<br>
+                "C"<br>
+            ],<br>
+            "then": {<br>
+                "operation": "boost",<br>
+                "score": 3.0<br>
+            },<br>
+            "else": {<br>
+                "operation": "boost",<br>
+                "score": -4.0<br>
+            }<br>
+        }<br>
+    },<br>
+    {<br>
+        "operation": "if_member",<br>
+        "feature": "ACategorical",<br>
+        "set": [<br>
+            "A",<br>
+            "B"<br>
+        ],<br>
+        "then": {<br>
+            "operation": "boost",<br>
+            "score": 5.0<br>
+        },<br>
+        "else": {<br>
+            "operation": "boost",<br>
+            "score": -6.0<br>
+        }<br>
+    }<br>
+]<br>
\ No newline at end of file<br>
<br>
<br>
<br>
_______________________________________________<br>
llvm-branch-commits mailing list<br>
<a href="mailto:llvm-branch-commits@lists.llvm.org" target="_blank">llvm-branch-commits@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits</a><br>
</blockquote></div>