[llvm-branch-commits] [llvm] [IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (PR #143986)

Mon Jun 23 09:49:37 PDT 2025

================
@@ -448,7 +448,10 @@ downstream tasks, including ML-guided compiler optimizations.
 
 The core components are:
   - **Vocabulary**: A mapping from IR entities (opcodes, types, etc.) to their
-    vector representations. This is managed by ``IR2VecVocabAnalysis``.
+    vector representations. This is managed by ``IR2VecVocabAnalysis``. The 
+    vocabulary (.json file) contains three sections -- Opcodes, Types, and 
+    Arguments, each containing the representations of the corresponding 
+    entities.
----------------
mtrofin wrote:

document that the sections are mandatory, but the order in which they appear isn't

https://github.com/llvm/llvm-project/pull/143986