[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use canonical type IDs (PR #155323)

Wed Aug 27 06:59:29 PDT 2025

================
@@ -162,8 +162,8 @@ class IR2VecTool {
 
     for (const BasicBlock &BB : F) {
       for (const auto &I : BB.instructionsWithoutDebug()) {
----------------
mtrofin wrote:

nit for later: should the iteration over BBs be a utility in ir2vec somehow - basically the reflex is to do 

```
for (const auto &BB : F)
  for (const auto &I : BB)
     // do stuff to I
```

meaning that `BB.instructionsWithoutDebug()` is not the immediately discoverable / the first place one goes to when coding.

One idea (again, later patch): what if the ir2vec APIs like getSlotIndex would return a "null" value for debug info - this can be a configurable option of the vocab or something like that - which, when used through ir2vec, it'd have no effect (like its embedding would be the 0 tensor, for instance)

just a thought. noop here.

https://github.com/llvm/llvm-project/pull/155323