[clang-tools-extra] bf935a0 - [clangd] Make categorical features 64 bit in DecisionForest Model.

Tue Mar 2 07:26:54 PST 2021

Author: Utkarsh Saxena
Date: 2021-03-02T16:22:30+01:00
New Revision: bf935a034b345e905907c80030c67ef8f737d56a

URL: https://github.com/llvm/llvm-project/commit/bf935a034b345e905907c80030c67ef8f737d56a
DIFF: https://github.com/llvm/llvm-project/commit/bf935a034b345e905907c80030c67ef8f737d56a.diff

LOG: [clangd] Make categorical features 64 bit in DecisionForest Model.

CodeCompletionContext::Kind has 36 Kinds. The completion model used to
support categorical features of 32 cardinality.
Due to this clangd tests were failing asan tests due to overflow.

This patch makes the completion model support 64 cardinality of
categorical features by storing ENUM Features as uint64_t instead of
uint32_t.

Verified that this fixes the asan failures.

Latency: 6.7ms (old) VS 6.8ms (new) per 1000 predictions.

Differential Revision: https://reviews.llvm.org/D97770

Added: 
    

Modified: 
    clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp
    clang-tools-extra/clangd/quality/CompletionModelCodegen.py

Removed: 
    


################################################################################
diff  --git a/clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp b/clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp
index b146def7b36f..47d4a76c4dd8 100644

--- a/clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp
+++ b/clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp
@@ -51,7 +51,7 @@ std::vector<Example> generateRandomDataset(int NumExamples) {
                       : RandInt(20));
     E.setSemaSaysInScope(FlipCoin(0.5));      // Boolean.
     E.setScope(RandInt(4));                   // 4 Scopes.
-    E.setContextKind(RandInt(32));            // 32 Context kinds.
+    E.setContextKind(RandInt(36));            // 36 Context kinds.
     E.setIsInstanceMember(FlipCoin(0.5));     // Boolean.
     E.setHadContextType(FlipCoin(0.6));       // Boolean.
     E.setHadSymbolType(FlipCoin(0.6));        // Boolean.

diff  --git a/clang-tools-extra/clangd/quality/CompletionModelCodegen.py b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py
index f27e8f6a28a3..4f2356fceacb 100644
--- a/clang-tools-extra/clangd/quality/CompletionModelCodegen.py
+++ b/clang-tools-extra/clangd/quality/CompletionModelCodegen.py
@@ -131,18 +131,19 @@ class can be used to represent a code completion candidate.
                     feature, feature))
         elif f["kind"] == "ENUM":
             setters.append(
-                "void set%s(unsigned V) { %s = 1 << V; }" % (feature, feature))
+                "void set%s(unsigned V) { %s = 1LL << V; }" % (feature, feature))
         else:
             raise ValueError("Unhandled feature type.", f["kind"])
 
     # Class members represent all the features of the Example.
     class_members = [
-        "uint32_t %s = 0;" % f['name']
+        "uint%d_t %s = 0;"
+        % (64 if f["kind"] == "ENUM" else 32, f['name'])
         for f in features_json
     ]
     getters = [
-        "LLVM_ATTRIBUTE_ALWAYS_INLINE uint32_t get%s() const { return %s; }"
-        % (f['name'], f['name'])
+        "LLVM_ATTRIBUTE_ALWAYS_INLINE uint%d_t get%s() const { return %s; }"
+        % (64 if f["kind"] == "ENUM" else 32, f['name'], f['name'])
         for f in features_json
     ]
     nline = "\n  "
@@ -245,7 +246,7 @@ def gen_cpp_code(forest_json, features_json, filename, cpp_class):
 
 %s
 
-#define BIT(X) (1 << X)
+#define BIT(X) (1LL << X)
 
 %s