<div dir="ltr">Hi Utkarsh,<div><br></div><div>I've temporarily reverted this here:</div><div><br></div><div>echristo@athyra ~/s/llvm-project (master)> git push<br>To github.com:llvm/llvm-project.git<br> 1f0b43638ed..549e55b3d56 master -> master<br></div><div><br></div><div>the decision forest header file referenced in the unittest doesn't appear to have made it into the commit?</div><div><br></div><div>Thanks and feel free to follow up if I've missed something.</div><div><br></div><div>-eric</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 18, 2020 at 12:38 PM Utkarsh Saxena via llvm-branch-commits <<a href="mailto:llvm-branch-commits@lists.llvm.org">llvm-branch-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Author: Utkarsh Saxena<br>
Date: 2020-09-18T18:27:42+02:00<br>
New Revision: 85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a<br>
<br>
URL: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a</a><br>
DIFF: <a href="https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/85c1c6a4ba4eebbd3f5cefb1512498b9f8a5bb7a.diff</a><br>
<br>
LOG: [clangd] Add Random Forest runtime for code completion.<br>
<br>
Summary:<br>
[WIP]<br>
- Proposes a json format for representing Random Forest model.<br>
- Proposes a way to test the generated runtime using a test model.<br>
<br>
TODO:<br>
- Add generated source code snippet for easier review.<br>
- Fix unused label warning.<br>
- Figure out required using declarations for CATEGORICAL columns from Features.json.<br>
- Necessary Google3 internal modifications for blaze before landing.<br>
- Add documentation for format of the model.<br>
- Document more.<br>
<br>
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits<br>
<br>
Tags: #clang<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D83814" rel="noreferrer" target="_blank">https://reviews.llvm.org/D83814</a><br>
<br>
Added: <br>
clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
clang-tools-extra/clangd/quality/README.md<br>
clang-tools-extra/clangd/quality/model/features.json<br>
clang-tools-extra/clangd/quality/model/forest.json<br>
clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
<br>
Modified: <br>
clang-tools-extra/clangd/CMakeLists.txt<br>
clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
<br>
Removed: <br>
<br>
<br>
<br>
################################################################################<br>
diff --git a/clang-tools-extra/clangd/CMakeLists.txt b/clang-tools-extra/clangd/CMakeLists.txt<br>
index 3a1a034ed17b..9d2ab5be222a 100644<br>
--- a/clang-tools-extra/clangd/CMakeLists.txt<br>
+++ b/clang-tools-extra/clangd/CMakeLists.txt<br>
@@ -28,6 +28,9 @@ set(LLVM_LINK_COMPONENTS<br>
FrontendOpenMP<br>
Option<br>
)<br>
+ <br>
+include(${CMAKE_CURRENT_SOURCE_DIR}/quality/CompletionModel.cmake)<br>
+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/quality/model CompletionModel clang::clangd::Example)<br>
<br>
if(MSVC AND NOT CLANG_CL)<br>
set_source_files_properties(CompileCommands.cpp PROPERTIES COMPILE_FLAGS -wd4130) # disables C4130: logical operation on address of string constant<br>
@@ -77,6 +80,7 @@ add_clang_library(clangDaemon<br>
TUScheduler.cpp<br>
URI.cpp<br>
XRefs.cpp<br>
+ ${CMAKE_CURRENT_BINARY_DIR}/CompletionModel.cpp<br>
<br>
index/Background.cpp<br>
index/BackgroundIndexLoader.cpp<br>
@@ -117,6 +121,11 @@ add_clang_library(clangDaemon<br>
omp_gen<br>
)<br>
<br>
+# Include generated CompletionModel headers.<br>
+target_include_directories(clangDaemon PUBLIC<br>
+ $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>
+)<br>
+<br>
clang_target_link_libraries(clangDaemon<br>
PRIVATE<br>
clangAST<br>
<br>
diff --git a/clang-tools-extra/clangd/quality/CompletionModel.cmake b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
new file mode 100644<br>
index 000000000000..60c6d2aa8433<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/CompletionModel.cmake<br>
@@ -0,0 +1,37 @@<br>
+# Run the Completion Model Codegenerator on the model present in the <br>
+# ${model} directory.<br>
+# Produces a pair of files called ${filename}.h and ${filename}.cpp in the <br>
+# ${CMAKE_CURRENT_BINARY_DIR}. The generated header<br>
+# will define a C++ class called ${cpp_class} - which may be a<br>
+# namespace-qualified class name.<br>
+function(gen_decision_forest model filename cpp_class)<br>
+ set(model_compiler ${CMAKE_SOURCE_DIR}/../clang-tools-extra/clangd/quality/CompletionModelCodegen.py)<br>
+ <br>
+ set(output_dir ${CMAKE_CURRENT_BINARY_DIR})<br>
+ set(header_file ${output_dir}/${filename}.h)<br>
+ set(cpp_file ${output_dir}/${filename}.cpp)<br>
+<br>
+ add_custom_command(OUTPUT ${header_file} ${cpp_file}<br>
+ COMMAND "${Python3_EXECUTABLE}" ${model_compiler}<br>
+ --model ${model}<br>
+ --output_dir ${output_dir}<br>
+ --filename ${filename}<br>
+ --cpp_class ${cpp_class}<br>
+ COMMENT "Generating code completion model runtime..."<br>
+ DEPENDS ${model_compiler} ${model}/forest.json ${model}/features.json<br>
+ VERBATIM )<br>
+<br>
+ set_source_files_properties(${header_file} PROPERTIES<br>
+ GENERATED 1)<br>
+ set_source_files_properties(${cpp_file} PROPERTIES<br>
+ GENERATED 1)<br>
+<br>
+ # Disable unused label warning for generated files.<br>
+ if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")<br>
+ set_source_files_properties(${cpp_file} PROPERTIES<br>
+ COMPILE_FLAGS /wd4102)<br>
+ else()<br>
+ set_source_files_properties(${cpp_file} PROPERTIES<br>
+ COMPILE_FLAGS -Wno-unused)<br>
+ endif()<br>
+endfunction()<br>
<br>
diff --git a/clang-tools-extra/clangd/quality/CompletionModelCodegen.py b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
new file mode 100644<br>
index 000000000000..8f8234f6ebbc<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py<br>
@@ -0,0 +1,283 @@<br>
+"""Code generator for Code Completion Model Inference.<br>
+<br>
+Tool runs on the Decision Forest model defined in {model} directory.<br>
+It generates two files: {output_dir}/{filename}.h and {output_dir}/{filename}.cpp <br>
+The generated files defines the Example class named {cpp_class} having all the features as class members.<br>
+The generated runtime provides an `Evaluate` function which can be used to score a code completion candidate.<br>
+"""<br>
+<br>
+import argparse<br>
+import json<br>
+import struct<br>
+from enum import Enum<br>
+<br>
+<br>
+class CppClass:<br>
+ """Holds class name and names of the enclosing namespaces."""<br>
+<br>
+ def __init__(self, cpp_class):<br>
+ ns_and_class = cpp_class.split("::")<br>
+ self.ns = [ns for ns in ns_and_class[0:-1] if len(ns) > 0]<br>
+ <a href="http://self.name" rel="noreferrer" target="_blank">self.name</a> = ns_and_class[-1]<br>
+ if len(<a href="http://self.name" rel="noreferrer" target="_blank">self.name</a>) == 0:<br>
+ raise ValueError("Empty class name.")<br>
+<br>
+ def ns_begin(self):<br>
+ """Returns snippet for opening namespace declarations."""<br>
+ open_ns = [f"namespace {ns} {{" for ns in self.ns]<br>
+ return "\n".join(open_ns)<br>
+<br>
+ def ns_end(self):<br>
+ """Returns snippet for closing namespace declarations."""<br>
+ close_ns = [<br>
+ f"}} // namespace {ns}" for ns in reversed(self.ns)]<br>
+ return "\n".join(close_ns)<br>
+<br>
+<br>
+def header_guard(filename):<br>
+ '''Returns the header guard for the generated header.'''<br>
+ return f"GENERATED_DECISION_FOREST_MODEL_{filename.upper()}_H"<br>
+<br>
+<br>
+def boost_node(n, label, next_label):<br>
+ """Returns code snippet for a leaf/boost node.<br>
+ Adds value of leaf to the score and jumps to the root of the next tree."""<br>
+ return f"{label}: Score += {n['score']}; goto {next_label};"<br>
+<br>
+<br>
+def if_greater_node(n, label, next_label):<br>
+ """Returns code snippet for a if_greater node.<br>
+ Jumps to true_label if the Example feature (NUMBER) is greater than the threshold. <br>
+ Comparing integers is much faster than comparing floats. Assuming floating points <br>
+ are represented as IEEE 754, it order-encodes the floats to integers before comparing them.<br>
+ Control falls through if condition is evaluated to false."""<br>
+ threshold = n["threshold"]<br>
+ return f"{label}: if (E.{n['feature']} >= {order_encode(threshold)} /*{threshold}*/) goto {next_label};"<br>
+<br>
+<br>
+def if_member_node(n, label, next_label):<br>
+ """Returns code snippet for a if_member node.<br>
+ Jumps to true_label if the Example feature (ENUM) is present in the set of enum values <br>
+ described in the node.<br>
+ Control falls through if condition is evaluated to false."""<br>
+ members = '|'.join([<br>
+ f"BIT({n['feature']}_type::{member})"<br>
+ for member in n["set"]<br>
+ ])<br>
+ return f"{label}: if (E.{n['feature']} & ({members})) goto {next_label};"<br>
+<br>
+<br>
+def node(n, label, next_label):<br>
+ """Returns code snippet for the node."""<br>
+ return {<br>
+ 'boost': boost_node,<br>
+ 'if_greater': if_greater_node,<br>
+ 'if_member': if_member_node,<br>
+ }[n['operation']](n, label, next_label)<br>
+<br>
+<br>
+def tree(t, tree_num: int, node_num: int):<br>
+ """Returns code for inferencing a Decision Tree.<br>
+ Also returns the size of the decision tree.<br>
+<br>
+ A tree starts with its label `t{tree#}`.<br>
+ A node of the tree starts with label `t{tree#}_n{node#}`.<br>
+<br>
+ The tree contains two types of node: Conditional node and Leaf node.<br>
+ - Conditional node evaluates a condition. If true, it jumps to the true node/child.<br>
+ Code is generated using pre-order traversal of the tree considering<br>
+ false node as the first child. Therefore the false node is always the<br>
+ immediately next label.<br>
+ - Leaf node adds the value to the score and jumps to the next tree.<br>
+ """<br>
+ label = f"t{tree_num}_n{node_num}"<br>
+ code = []<br>
+ if node_num == 0:<br>
+ code.append(f"t{tree_num}:")<br>
+<br>
+ if t["operation"] == "boost":<br>
+ code.append(node(t, label=label, next_label=f"t{tree_num+1}"))<br>
+ return code, 1<br>
+<br>
+ false_code, false_size = tree(<br>
+ t['else'], tree_num=tree_num, node_num=node_num+1)<br>
+<br>
+ true_node_num = node_num+false_size+1<br>
+ true_label = f"t{tree_num}_n{true_node_num}"<br>
+<br>
+ true_code, true_size = tree(<br>
+ t['then'], tree_num=tree_num, node_num=true_node_num)<br>
+<br>
+ code.append(node(t, label=label, next_label=true_label))<br>
+<br>
+ return code+false_code+true_code, 1+false_size+true_size<br>
+<br>
+<br>
+def gen_header_code(features_json: list, cpp_class, filename: str):<br>
+ """Returns code for header declaring the inference runtime.<br>
+<br>
+ Declares the Example class named {cpp_class} inside relevant namespaces.<br>
+ The Example class contains all the features as class members. This <br>
+ class can be used to represent a code completion candidate.<br>
+ Provides `float Evaluate()` function which can be used to score the Example.<br>
+ """<br>
+ setters = []<br>
+ for f in features_json:<br>
+ feature = f["name"]<br>
+ if f["kind"] == "NUMBER":<br>
+ # Floats are order-encoded to integers for faster comparison.<br>
+ setters.append(<br>
+ f"void set{feature}(float V) {{ {feature} = OrderEncode(V); }}")<br>
+ elif f["kind"] == "ENUM":<br>
+ setters.append(<br>
+ f"void set{feature}(unsigned V) {{ {feature} = 1 << V; }}")<br>
+ else:<br>
+ raise ValueError("Unhandled feature type.", f["kind"])<br>
+<br>
+ # Class members represent all the features of the Example.<br>
+ class_members = [f"uint32_t {f['name']} = 0;" for f in features_json]<br>
+<br>
+ nline = "\n "<br>
+ guard = header_guard(filename)<br>
+ return f"""#ifndef {guard}<br>
+#define {guard}<br>
+#include <cstdint><br>
+<br>
+{cpp_class.ns_begin()}<br>
+class {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>} {{<br>
+public:<br>
+ {nline.join(setters)}<br>
+<br>
+private:<br>
+ {nline.join(class_members)}<br>
+<br>
+ // Produces an integer that sorts in the same order as F.<br>
+ // That is: a < b <==> orderEncode(a) < orderEncode(b).<br>
+ static uint32_t OrderEncode(float F);<br>
+ friend float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>
+}};<br>
+<br>
+float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}&);<br>
+{cpp_class.ns_end()}<br>
+#endif // {guard}<br>
+"""<br>
+<br>
+<br>
+def order_encode(v: float):<br>
+ i = struct.unpack('<I', struct.pack('<f', v))[0]<br>
+ TopBit = 1 << 31<br>
+ # IEEE 754 floats compare like sign-magnitude integers.<br>
+ if (i & TopBit): # Negative float<br>
+ return (1 << 32) - i # low half of integers, order reversed.<br>
+ return TopBit + i # top half of integers<br>
+<br>
+<br>
+def evaluate_func(forest_json: list, cpp_class: CppClass):<br>
+ """Generates code for `float Evaluate(const {Example}&)` function.<br>
+ The generated function can be used to score an Example."""<br>
+ code = f"float Evaluate(const {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}& E) {{\n"<br>
+ lines = []<br>
+ lines.append("float Score = 0;")<br>
+ tree_num = 0<br>
+ for tree_json in forest_json:<br>
+ lines.extend(tree(tree_json, tree_num=tree_num, node_num=0)[0])<br>
+ lines.append("")<br>
+ tree_num += 1<br>
+<br>
+ lines.append(f"t{len(forest_json)}: // No such tree.")<br>
+ lines.append("return Score;")<br>
+ code += " " + "\n ".join(lines)<br>
+ code += "\n}"<br>
+ return code<br>
+<br>
+<br>
+def gen_cpp_code(forest_json: list, features_json: list, filename: str,<br>
+ cpp_class: CppClass):<br>
+ """Generates code for the .cpp file."""<br>
+ # Headers<br>
+ # Required by OrderEncode(float F).<br>
+ angled_include = [<br>
+ f'#include <{h}>'<br>
+ for h in ["cstring", "limits"]<br>
+ ]<br>
+<br>
+ # Include generated header.<br>
+ qouted_headers = {f"{filename}.h", "llvm/ADT/bit.h"}<br>
+ # Headers required by ENUM features used by the model.<br>
+ qouted_headers |= {f["header"]<br>
+ for f in features_json if f["kind"] == "ENUM"}<br>
+ quoted_include = [f'#include "{h}"' for h in sorted(qouted_headers)]<br>
+<br>
+ # using-decl for ENUM features.<br>
+ using_decls = "\n".join(f"using {feature['name']}_type = {feature['type']};"<br>
+ for feature in features_json<br>
+ if feature["kind"] == "ENUM")<br>
+ nl = "\n"<br>
+ return f"""{nl.join(angled_include)}<br>
+<br>
+{nl.join(quoted_include)}<br>
+<br>
+#define BIT(X) (1 << X)<br>
+<br>
+{cpp_class.ns_begin()}<br>
+<br>
+{using_decls}<br>
+<br>
+uint32_t {<a href="http://cpp_class.name" rel="noreferrer" target="_blank">cpp_class.name</a>}::OrderEncode(float F) {{<br>
+ static_assert(std::numeric_limits<float>::is_iec559, "");<br>
+ constexpr uint32_t TopBit = ~(~uint32_t{{0}} >> 1);<br>
+<br>
+ // Get the bits of the float. Endianness is the same as for integers.<br>
+ uint32_t U = llvm::bit_cast<uint32_t>(F);<br>
+ std::memcpy(&U, &F, sizeof(U));<br>
+ // IEEE 754 floats compare like sign-magnitude integers.<br>
+ if (U & TopBit) // Negative float.<br>
+ return 0 - U; // Map onto the low half of integers, order reversed.<br>
+ return U + TopBit; // Positive floats map onto the high half of integers.<br>
+}}<br>
+<br>
+{evaluate_func(forest_json, cpp_class)}<br>
+{cpp_class.ns_end()}<br>
+"""<br>
+<br>
+<br>
+def main():<br>
+ parser = argparse.ArgumentParser('DecisionForestCodegen')<br>
+ parser.add_argument('--filename', help='output file name.')<br>
+ parser.add_argument('--output_dir', help='output directory.')<br>
+ parser.add_argument('--model', help='path to model directory.')<br>
+ parser.add_argument(<br>
+ '--cpp_class',<br>
+ help='The name of the class (which may be a namespace-qualified) created in generated header.'<br>
+ )<br>
+ ns = parser.parse_args()<br>
+<br>
+ output_dir = ns.output_dir<br>
+ filename = ns.filename<br>
+ header_file = f"{output_dir}/{filename}.h"<br>
+ cpp_file = f"{output_dir}/{filename}.cpp"<br>
+ cpp_class = CppClass(cpp_class=ns.cpp_class)<br>
+<br>
+ model_file = f"{ns.model}/forest.json"<br>
+ features_file = f"{ns.model}/features.json"<br>
+<br>
+ with open(features_file) as f:<br>
+ features_json = json.load(f)<br>
+<br>
+ with open(model_file) as m:<br>
+ forest_json = json.load(m)<br>
+<br>
+ with open(cpp_file, 'w+t') as output_cc:<br>
+ output_cc.write(<br>
+ gen_cpp_code(forest_json=forest_json,<br>
+ features_json=features_json,<br>
+ filename=filename,<br>
+ cpp_class=cpp_class))<br>
+<br>
+ with open(header_file, 'w+t') as output_h:<br>
+ output_h.write(gen_header_code(<br>
+ features_json=features_json, cpp_class=cpp_class, filename=filename))<br>
+<br>
+<br>
+if __name__ == '__main__':<br>
+ main()<br>
<br>
diff --git a/clang-tools-extra/clangd/quality/README.md b/clang-tools-extra/clangd/quality/README.md<br>
new file mode 100644<br>
index 000000000000..36fa37320e54<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/README.md<br>
@@ -0,0 +1,220 @@<br>
+# Decision Forest Code Completion Model<br>
+<br>
+## Decision Forest<br>
+A **decision forest** is a collection of many decision trees. A **decision tree** is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a **binary decision** based on the input data, and leaf nodes represent a prediction.<br>
+<br>
+In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf. <br>
+<br>
+An input (code completion candidate) is characterized as a set of **features**, such as the *type of symbol* or the *number of existing references*.<br>
+<br>
+At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one *feature** of the input against a constant. The condition can be of two types:<br>
+- **if_greater**: Checks whether a numerical feature is **>=** a **threshold**.<br>
+- **if_member**: Check whether the **enum** feature is contained in the **set** defined in the node.<br>
+<br>
+A leaf node contains the value **score**.<br>
+To compute an overall **quality** score, we traverse each tree in this way and add up the scores.<br>
+<br>
+## Model Input Format<br>
+The input model is represented in json format.<br>
+<br>
+### Features<br>
+The file **features.json** defines the features available to the model. <br>
+It is a json list of features. The features can be of following two kinds.<br>
+<br>
+#### Number<br>
+```<br>
+{<br>
+ "name": "a_numerical_feature",<br>
+ "kind": "NUMBER"<br>
+}<br>
+```<br>
+#### Enum<br>
+```<br>
+{<br>
+ "name": "an_enum_feature",<br>
+ "kind": "ENUM",<br>
+ "enum": "fully::qualified::enum",<br>
+ "header": "path/to/HeaderDeclaringEnum.h"<br>
+}<br>
+```<br>
+The field `enum` specifies the fully qualified name of the enum.<br>
+The maximum cardinality of the enum can be **32**.<br>
+<br>
+The field `header` specifies the header containing the declaration of the enum.<br>
+This header is included by the inference runtime.<br>
+<br>
+<br>
+### Decision Forest<br>
+The file `forest.json` defines the decision forest. It is a json list of **DecisionTree**.<br>
+<br>
+**DecisionTree** is one of **IfGreaterNode**, **IfMemberNode**, **LeafNode**.<br>
+#### IfGreaterNode<br>
+```<br>
+{<br>
+ "operation": "if_greater",<br>
+ "feature": "a_numerical_feature",<br>
+ "threshold": A real number,<br>
+ "then": {A DecisionTree},<br>
+ "else": {A DecisionTree}<br>
+}<br>
+```<br>
+#### IfMemberNode<br>
+```<br>
+{<br>
+ "operation": "if_member",<br>
+ "feature": "an_enum_feature",<br>
+ "set": ["enum_value1", "enum_value2", ...],<br>
+ "then": {A DecisionTree},<br>
+ "else": {A DecisionTree}<br>
+}<br>
+```<br>
+#### LeafNode<br>
+```<br>
+{<br>
+ "operation": "boost",<br>
+ "score": A real number<br>
+}<br>
+```<br>
+<br>
+## Code Generator for Inference<br>
+The implementation of inference runtime is split across:<br>
+<br>
+### Code generator<br>
+The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library: <br>
+- `${output_dir}/{filename}.h`<br>
+- `${output_dir}/{filename}.cpp`<br>
+<br>
+Invocation<br>
+```<br>
+python3 CompletionModelCodegen.py \<br>
+ --model path/to/model/dir \<br>
+ --output_dir path/to/output/dir \<br>
+ --filename OutputFileName \<br>
+ --cpp_class clang::clangd::YourExampleClass<br>
+```<br>
+### Build System<br>
+`CompletionModel.cmake` provides `gen_decision_forest` method . <br>
+Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library.<br>
+It can then use the generated API by including and depending on this library.<br>
+<br>
+### Generated API for inference<br>
+The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`.<br>
+<br>
+Members of this generated class comprises of all the features mentioned in `features.json`. <br>
+Thus this class can represent a code completion candidate that needs to be scored.<br>
+<br>
+The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate.<br>
+<br>
+<br>
+## Example<br>
+### model/features.json<br>
+```<br>
+[<br>
+ {<br>
+ "name": "ANumber",<br>
+ "type": "NUMBER"<br>
+ },<br>
+ {<br>
+ "name": "AFloat",<br>
+ "type": "NUMBER"<br>
+ },<br>
+ {<br>
+ "name": "ACategorical",<br>
+ "type": "ENUM",<br>
+ "enum": "ns1::ns2::TestEnum",<br>
+ "header": "model/CategoricalFeature.h"<br>
+ }<br>
+]<br>
+```<br>
+### model/forest.json<br>
+```<br>
+[<br>
+ {<br>
+ "operation": "if_greater",<br>
+ "feature": "ANumber",<br>
+ "threshold": 200.0,<br>
+ "then": {<br>
+ "operation": "if_greater",<br>
+ "feature": "AFloat",<br>
+ "threshold": -1,<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 10.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -20.0<br>
+ }<br>
+ },<br>
+ "else": {<br>
+ "operation": "if_member",<br>
+ "feature": "ACategorical",<br>
+ "set": [<br>
+ "A",<br>
+ "C"<br>
+ ],<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 3.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -4.0<br>
+ }<br>
+ }<br>
+ },<br>
+ {<br>
+ "operation": "if_member",<br>
+ "feature": "ACategorical",<br>
+ "set": [<br>
+ "A",<br>
+ "B"<br>
+ ],<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 5.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -6.0<br>
+ }<br>
+ }<br>
+]<br>
+```<br>
+### DecisionForestRuntime.h<br>
+```<br>
+...<br>
+namespace ns1 {<br>
+namespace ns2 {<br>
+namespace test {<br>
+class Example {<br>
+public:<br>
+ void setANumber(float V) { ... }<br>
+ void setAFloat(float V) { ... }<br>
+ void setACategorical(unsigned V) { ... }<br>
+<br>
+private:<br>
+ ...<br>
+};<br>
+<br>
+float Evaluate(const Example&);<br>
+} // namespace test<br>
+} // namespace ns2<br>
+} // namespace ns1<br>
+```<br>
+<br>
+### CMake Invocation<br>
+Inorder to use the inference runtime, one can use `gen_decision_forest` function <br>
+described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments.<br>
+<br>
+For example, the following invocation reads the model present in `path/to/model` and creates <br>
+`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp` <br>
+describing a `class` named `MyClass` in namespace `fully::qualified`.<br>
+<br>
+<br>
+<br>
+```<br>
+gen_decision_forest(path/to/model<br>
+ myfilename<br>
+ ::fully::qualifed::MyClass)<br>
+```<br>
\ No newline at end of file<br>
<br>
diff --git a/clang-tools-extra/clangd/quality/model/features.json b/clang-tools-extra/clangd/quality/model/features.json<br>
new file mode 100644<br>
index 000000000000..e91eccd1ce20<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/model/features.json<br>
@@ -0,0 +1,8 @@<br>
+[<br>
+ {<br>
+ "name": "ContextKind",<br>
+ "kind": "ENUM",<br>
+ "type": "clang::CodeCompletionContext::Kind",<br>
+ "header": "clang/Sema/CodeCompleteConsumer.h"<br>
+ }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff --git a/clang-tools-extra/clangd/quality/model/forest.json b/clang-tools-extra/clangd/quality/model/forest.json<br>
new file mode 100644<br>
index 000000000000..78a1524e2d81<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/quality/model/forest.json<br>
@@ -0,0 +1,18 @@<br>
+[<br>
+ {<br>
+ "operation": "if_member",<br>
+ "feature": "ContextKind",<br>
+ "set": [<br>
+ "CCC_DotMemberAccess",<br>
+ "CCC_ArrowMemberAccess"<br>
+ ],<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 3.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": 1.0<br>
+ }<br>
+ }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/CMakeLists.txt b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
index 2167b5e210e2..a84fd0b71ca5 100644<br>
--- a/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
+++ b/clang-tools-extra/clangd/unittests/CMakeLists.txt<br>
@@ -28,6 +28,9 @@ if (CLANGD_ENABLE_REMOTE)<br>
set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)<br>
endif()<br>
<br>
+include(${CMAKE_CURRENT_SOURCE_DIR}/../quality/CompletionModel.cmake)<br>
+gen_decision_forest(${CMAKE_CURRENT_SOURCE_DIR}/decision_forest_model DecisionForestRuntimeTest ::ns1::ns2::test::Example)<br>
+<br>
add_custom_target(ClangdUnitTests)<br>
add_unittest(ClangdUnitTests ClangdTests<br>
Annotations.cpp<br>
@@ -44,6 +47,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>
ConfigCompileTests.cpp<br>
ConfigProviderTests.cpp<br>
ConfigYAMLTests.cpp<br>
+ DecisionForestTests.cpp<br>
DexTests.cpp<br>
DiagnosticsTests.cpp<br>
DraftStoreTests.cpp<br>
@@ -89,6 +93,7 @@ add_unittest(ClangdUnitTests ClangdTests<br>
TweakTesting.cpp<br>
URITests.cpp<br>
XRefsTests.cpp<br>
+ ${CMAKE_CURRENT_BINARY_DIR}/DecisionForestRuntimeTest.cpp<br>
<br>
support/CancellationTests.cpp<br>
support/ContextTests.cpp<br>
@@ -103,6 +108,11 @@ add_unittest(ClangdUnitTests ClangdTests<br>
$<TARGET_OBJECTS:obj.clangDaemonTweaks><br>
)<br>
<br>
+# Include generated ComletionModel headers.<br>
+target_include_directories(ClangdTests PUBLIC<br>
+ $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}><br>
+)<br>
+<br>
clang_target_link_libraries(ClangdTests<br>
PRIVATE<br>
clangAST<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
index 635e036039a0..460976d64f9f 100644<br>
--- a/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
+++ b/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp<br>
@@ -10,6 +10,7 @@<br>
#include "ClangdServer.h"<br>
#include "CodeComplete.h"<br>
#include "Compiler.h"<br>
+#include "CompletionModel.h"<br>
#include "Matchers.h"<br>
#include "Protocol.h"<br>
#include "Quality.h"<br>
@@ -47,6 +48,7 @@ using ::testing::HasSubstr;<br>
using ::testing::IsEmpty;<br>
using ::testing::Not;<br>
using ::testing::UnorderedElementsAre;<br>
+using ContextKind = CodeCompletionContext::Kind;<br>
<br>
// GMock helpers for matching completion items.<br>
MATCHER_P(Named, Name, "") { return arg.Name == Name; }<br>
@@ -161,6 +163,16 @@ Symbol withReferences(int N, Symbol S) {<br>
return S;<br>
}<br>
<br>
+TEST(DecisionForestRuntime, SanityTest) {<br>
+ using Example = clangd::Example;<br>
+ using clangd::Evaluate;<br>
+ Example E1;<br>
+ E1.setContextKind(ContextKind::CCC_ArrowMemberAccess);<br>
+ Example E2;<br>
+ E2.setContextKind(ContextKind::CCC_SymbolOrNewName);<br>
+ EXPECT_GT(Evaluate(E1), Evaluate(E2));<br>
+}<br>
+<br>
TEST(CompletionTest, Limit) {<br>
clangd::CodeCompleteOptions Opts;<br>
Opts.Limit = 2;<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
new file mode 100644<br>
index 000000000000..d29c8a4a0358<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/DecisionForestTests.cpp<br>
@@ -0,0 +1,29 @@<br>
+#include "DecisionForestRuntimeTest.h"<br>
+#include "decision_forest_model/CategoricalFeature.h"<br>
+#include "gtest/gtest.h"<br>
+<br>
+namespace clang {<br>
+namespace clangd {<br>
+<br>
+TEST(DecisionForestRuntime, Evaluate) {<br>
+ using Example = ::ns1::ns2::test::Example;<br>
+ using Cat = ::ns1::ns2::TestEnum;<br>
+ using ::ns1::ns2::test::Evaluate;<br>
+<br>
+ Example E;<br>
+ E.setANumber(200); // True<br>
+ E.setAFloat(0); // True: +10.0<br>
+ E.setACategorical(Cat::A); // True: +5.0<br>
+ EXPECT_EQ(Evaluate(E), 15.0);<br>
+<br>
+ E.setANumber(200); // True<br>
+ E.setAFloat(-2.5); // False: -20.0<br>
+ E.setACategorical(Cat::B); // True: +5.0<br>
+ EXPECT_EQ(Evaluate(E), -15.0);<br>
+<br>
+ E.setANumber(100); // False<br>
+ E.setACategorical(Cat::C); // True: +3.0, False: -6.0<br>
+ EXPECT_EQ(Evaluate(E), -3.0);<br>
+}<br>
+} // namespace clangd<br>
+} // namespace clang<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
new file mode 100644<br>
index 000000000000..dfb6ab3b199d<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/CategoricalFeature.h<br>
@@ -0,0 +1,5 @@<br>
+namespace ns1 {<br>
+namespace ns2 {<br>
+enum TestEnum { A, B, C, D };<br>
+} // namespace ns2<br>
+} // namespace ns1<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/decision_forest_model/features.json b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
new file mode 100644<br>
index 000000000000..7f159f192e19<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/features.json<br>
@@ -0,0 +1,16 @@<br>
+[<br>
+ {<br>
+ "name": "ANumber",<br>
+ "kind": "NUMBER"<br>
+ },<br>
+ {<br>
+ "name": "AFloat",<br>
+ "kind": "NUMBER"<br>
+ },<br>
+ {<br>
+ "name": "ACategorical",<br>
+ "kind": "ENUM",<br>
+ "type": "ns1::ns2::TestEnum",<br>
+ "header": "decision_forest_model/CategoricalFeature.h"<br>
+ }<br>
+]<br>
\ No newline at end of file<br>
<br>
diff --git a/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
new file mode 100644<br>
index 000000000000..26f071da485d<br>
--- /dev/null<br>
+++ b/clang-tools-extra/clangd/unittests/decision_forest_model/forest.json<br>
@@ -0,0 +1,52 @@<br>
+[<br>
+ {<br>
+ "operation": "if_greater",<br>
+ "feature": "ANumber",<br>
+ "threshold": 200.0,<br>
+ "then": {<br>
+ "operation": "if_greater",<br>
+ "feature": "AFloat",<br>
+ "threshold": -1,<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 10.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -20.0<br>
+ }<br>
+ },<br>
+ "else": {<br>
+ "operation": "if_member",<br>
+ "feature": "ACategorical",<br>
+ "set": [<br>
+ "A",<br>
+ "C"<br>
+ ],<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 3.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -4.0<br>
+ }<br>
+ }<br>
+ },<br>
+ {<br>
+ "operation": "if_member",<br>
+ "feature": "ACategorical",<br>
+ "set": [<br>
+ "A",<br>
+ "B"<br>
+ ],<br>
+ "then": {<br>
+ "operation": "boost",<br>
+ "score": 5.0<br>
+ },<br>
+ "else": {<br>
+ "operation": "boost",<br>
+ "score": -6.0<br>
+ }<br>
+ }<br>
+]<br>
\ No newline at end of file<br>
<br>
<br>
<br>
_______________________________________________<br>
llvm-branch-commits mailing list<br>
<a href="mailto:llvm-branch-commits@lists.llvm.org" target="_blank">llvm-branch-commits@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits</a><br>
</blockquote></div>