[all-commits] [llvm/llvm-project] 45c549: [IR2Vec] Refactor vocabulary to use canonical type...

S. VenkataKeerthy via All-commits all-commits at lists.llvm.org
Fri Aug 29 14:57:18 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 45c549857383eade0c47db75951fb9441260653a
      https://github.com/llvm/llvm-project/commit/45c549857383eade0c47db75951fb9441260653a
  Author: S. VenkataKeerthy <31350914+svkeerthy at users.noreply.github.com>
  Date:   2025-08-29 (Fri, 29 Aug 2025)

  Changed paths:
    M llvm/include/llvm/Analysis/IR2Vec.h
    M llvm/lib/Analysis/IR2Vec.cpp
    M llvm/test/Analysis/IR2Vec/Inputs/reference_default_vocab_print.txt
    M llvm/test/Analysis/IR2Vec/Inputs/reference_wtd1_vocab_print.txt
    M llvm/test/Analysis/IR2Vec/Inputs/reference_wtd2_vocab_print.txt
    M llvm/test/tools/llvm-ir2vec/entities.ll
    M llvm/test/tools/llvm-ir2vec/triplets.ll
    M llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
    M llvm/unittests/Analysis/IR2VecTest.cpp

  Log Message:
  -----------
  [IR2Vec] Refactor vocabulary to use canonical type IDs (#155323)

Refactor IR2Vec vocabulary to use canonical type IDs, improving the embedding representation for LLVM IR types.

The previous implementation used raw Type::TypeID values directly in the vocabulary, which led to redundant entries (e.g., all float variants mapped to "FloatTy" but had separate slots). This change improves the vocabulary by:

1. Making the type representation more consistent by properly canonicalizing types
2. Reducing vocabulary size by eliminating redundant entries
3. Improving the embedding quality by ensuring similar types share the same representation

(Tracking issue - #141817)



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list