[llvm] a29f0dd - [llubi] Add initial support for llubi (#180022)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 9 09:54:44 PST 2026
Author: Yingwei Zheng
Date: 2026-02-10T01:54:34+08:00
New Revision: a29f0dd09680fbf7c24aa182c87f51cf4b93e21d
URL: https://github.com/llvm/llvm-project/commit/a29f0dd09680fbf7c24aa182c87f51cf4b93e21d
DIFF: https://github.com/llvm/llvm-project/commit/a29f0dd09680fbf7c24aa182c87f51cf4b93e21d.diff
LOG: [llubi] Add initial support for llubi (#180022)
This patch implements the initial support for upstreaming
[llubi](https://github.com/dtcxzyw/llvm-ub-aware-interpreter). It only
provides the minimal functionality to run a simple main function. I hope
we can focus on the interface design in this PR, rather than trivial
implementations for each instruction.
RFC link:
https://discourse.llvm.org/t/rfc-upstreaming-llvm-ub-aware-interpreter/89645
Excluding the driver `llubi.cpp`, this patch contains three components
for better decoupling:
+ `Value.h/cpp`: Value representation
+ `Context.h/cpp`: Global state management (e.g., memory) and
interpreter configuration
+ `Interpreter.cpp`: The main interpreter loop
Compared to the out-of-tree version, the major differences are listed
below:
+ The interpreter logic always returns the control to its caller, i.e.,
it never calls `exit/abort` when immediate UBs are triggered.
+ `EventHandler` provides an interface to dump the trace. It also allows
callers to inspect the actual value and verify the correctness of
analysis passes (e.g, KnownBits/SCEV).
+ The context is designed to be reentrant. That is, you can call
`runFunction` multiple times. But its usefulness remains in doubt due to
side effects made by previous calls.
+ `runFunction` handles function calls with a loop, instead of calling
itself recursively. This makes it no longer bounded by the stack depth.
+ Uninitialized memory is planned to be approximated by returning random
values each time an uninitialized byte is loaded.
Added:
llvm/docs/CommandGuide/llubi.rst
llvm/test/tools/llubi/main.ll
llvm/test/tools/llubi/main2.ll
llvm/test/tools/llubi/poison.ll
llvm/tools/llubi/CMakeLists.txt
llvm/tools/llubi/lib/CMakeLists.txt
llvm/tools/llubi/lib/Context.cpp
llvm/tools/llubi/lib/Context.h
llvm/tools/llubi/lib/Interpreter.cpp
llvm/tools/llubi/lib/Value.cpp
llvm/tools/llubi/lib/Value.h
llvm/tools/llubi/llubi.cpp
Modified:
llvm/docs/CommandGuide/index.rst
llvm/test/CMakeLists.txt
llvm/test/lit.cfg.py
Removed:
################################################################################
diff --git a/llvm/docs/CommandGuide/index.rst b/llvm/docs/CommandGuide/index.rst
index 8f080ded81c69..c6427d1245b9a 100644
--- a/llvm/docs/CommandGuide/index.rst
+++ b/llvm/docs/CommandGuide/index.rst
@@ -17,6 +17,7 @@ Basic Commands
dsymutil
llc
lli
+ llubi
llvm-as
llvm-cgdata
llvm-config
diff --git a/llvm/docs/CommandGuide/llubi.rst b/llvm/docs/CommandGuide/llubi.rst
new file mode 100644
index 0000000000000..f652af83d810a
--- /dev/null
+++ b/llvm/docs/CommandGuide/llubi.rst
@@ -0,0 +1,79 @@
+llubi - LLVM UB-aware Interpreter
+=================================
+
+.. program:: llubi
+
+SYNOPSIS
+--------
+
+:program:`llubi` [*options*] [*filename*] [*program args*]
+
+DESCRIPTION
+-----------
+
+:program:`llubi` directly executes programs in LLVM bitcode format and tracks values in LLVM IR semantics.
+Unlike :program:`lli`, :program:`llubi` is designed to be aware of undefined behaviors during execution.
+It detects immediate undefined behaviors such as integer division by zero, and respects poison generating flags
+like `nsw` and `nuw`. As it captures most of the guardable undefined behaviors, it is highly suitable for
+constructing an interesting-ness test for miscompilation bugs.
+
+If `filename` is not specified, then :program:`llubi` reads the LLVM bitcode for the
+program from standard input.
+
+The optional *args* specified on the command line are passed to the program as
+arguments.
+
+GENERAL OPTIONS
+---------------
+
+.. option:: -fake-argv0=executable
+
+ Override the ``argv[0]`` value passed into the executing program.
+
+.. option:: -entry-function=function
+
+ Specify the name of the function to execute as the program's entry point.
+ By default, :program:`llubi` uses the function named ``main``.
+
+.. option:: -help
+
+ Print a summary of command line options.
+
+.. option:: -verbose
+
+ Print results for each instruction executed.
+
+.. option:: -version
+
+ Print out the version of :program:`llubi` and exit without doing anything else.
+
+INTERPRETER OPTIONS
+-------------------
+
+.. option:: -max-mem=N
+
+ Limit the amount of memory (in bytes) that can be allocated by the program, including
+ stack, heap, and global variables. If the limit is exceeded, execution will be terminated.
+ By default, there is no limit (N = 0).
+
+.. option:: -max-stack-depth=N
+
+ Limit the maximum stack depth to N. If the limit is exceeded, execution will be terminated.
+ The default limit is 256. Set N to 0 to disable the limit.
+
+.. option:: -max-steps=N
+
+ Limit the number of instructions executed to N. If the limit is reached, execution will
+ be terminated. By default, there is no limit (N = 0).
+
+.. option:: -vscale=N
+
+ Set the value of `llvm.vscale` to N. The default value is 4.
+
+EXIT STATUS
+-----------
+
+If :program:`llubi` fails to load the program, or an error occurs during execution (e.g, an immediate undefined
+behavior is triggered), it will exit with an exit code of 1.
+If the return type of entry function is not an integer type, it will return 0.
+Otherwise, it will return the exit code of the program.
diff --git a/llvm/test/CMakeLists.txt b/llvm/test/CMakeLists.txt
index 77fbbe28ca56d..388ce613ad1d0 100644
--- a/llvm/test/CMakeLists.txt
+++ b/llvm/test/CMakeLists.txt
@@ -76,6 +76,7 @@ set(LLVM_TEST_DEPENDS
llc
lli
lli-child-target
+ llubi
llvm-addr2line
llvm-ar
llvm-as
diff --git a/llvm/test/lit.cfg.py b/llvm/test/lit.cfg.py
index 8463e667d9f71..79b78ffeb2dab 100644
--- a/llvm/test/lit.cfg.py
+++ b/llvm/test/lit.cfg.py
@@ -235,6 +235,7 @@ def get_asan_rtlib():
"dsymutil",
"lli",
"lli-child-target",
+ "llubi",
"llvm-ar",
"llvm-as",
"llvm-addr2line",
diff --git a/llvm/test/tools/llubi/main.ll b/llvm/test/tools/llubi/main.ll
new file mode 100644
index 0000000000000..c10824621018e
--- /dev/null
+++ b/llvm/test/tools/llubi/main.ll
@@ -0,0 +1,11 @@
+; RUN: llubi --verbose < %s 2>&1 | FileCheck %s
+
+define i32 @main(i32 %argc, ptr %argv) {
+ ret i32 0
+}
+
+; CHECK: Entering function: main
+; CHECK: i32 %argc = i32 1
+; CHECK: ptr %argv = ptr 0x8 [argv]
+; CHECK: ret i32 0
+; CHECK: Exiting function: main
diff --git a/llvm/test/tools/llubi/main2.ll b/llvm/test/tools/llubi/main2.ll
new file mode 100644
index 0000000000000..58c5744bb0909
--- /dev/null
+++ b/llvm/test/tools/llubi/main2.ll
@@ -0,0 +1,9 @@
+; RUN: llubi --verbose < %s 2>&1 | FileCheck %s
+
+define i32 @main() {
+ ret i32 0
+}
+
+; CHECK: Entering function: main
+; CHECK: ret i32 0
+; CHECK: Exiting function: main
diff --git a/llvm/test/tools/llubi/poison.ll b/llvm/test/tools/llubi/poison.ll
new file mode 100644
index 0000000000000..cf3b69d1aeb77
--- /dev/null
+++ b/llvm/test/tools/llubi/poison.ll
@@ -0,0 +1,11 @@
+; RUN: not llubi --verbose < %s 2>&1 | FileCheck %s
+
+define i32 @main(i32 %argc, ptr %argv) {
+ ret i32 poison
+}
+; CHECK: Entering function: main
+; CHECK: i32 %argc = i32 1
+; CHECK: ptr %argv = ptr 0x8 [argv]
+; CHECK: ret i32 poison
+; CHECK: Exiting function: main
+; CHECK: error: Execution of function 'main' resulted in poison return value.
diff --git a/llvm/tools/llubi/CMakeLists.txt b/llvm/tools/llubi/CMakeLists.txt
new file mode 100644
index 0000000000000..46d06f6e5dfc2
--- /dev/null
+++ b/llvm/tools/llubi/CMakeLists.txt
@@ -0,0 +1,17 @@
+set(LLVM_LINK_COMPONENTS
+ Analysis
+ Core
+ IRPrinter
+ IRReader
+ Support
+ )
+
+add_llvm_tool(llubi
+ llubi.cpp
+
+ DEPENDS
+ intrinsics_gen
+ )
+
+add_subdirectory(lib)
+target_link_libraries(llubi PRIVATE LLVMUBAwareInterpreter)
diff --git a/llvm/tools/llubi/lib/CMakeLists.txt b/llvm/tools/llubi/lib/CMakeLists.txt
new file mode 100644
index 0000000000000..d3b54d0bd45b5
--- /dev/null
+++ b/llvm/tools/llubi/lib/CMakeLists.txt
@@ -0,0 +1,12 @@
+set(LLVM_LINK_COMPONENTS
+ Analysis
+ Core
+ Support
+ )
+
+add_llvm_library(LLVMUBAwareInterpreter
+ STATIC
+ Context.cpp
+ Interpreter.cpp
+ Value.cpp
+ )
diff --git a/llvm/tools/llubi/lib/Context.cpp b/llvm/tools/llubi/lib/Context.cpp
new file mode 100644
index 0000000000000..6b5362204cfde
--- /dev/null
+++ b/llvm/tools/llubi/lib/Context.cpp
@@ -0,0 +1,129 @@
+//===- Context.cpp - State Tracking for llubi -----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file tracks the global states (e.g., memory) of the interpreter.
+//
+//===----------------------------------------------------------------------===//
+
+#include "Context.h"
+#include "llvm/Support/MathExtras.h"
+
+namespace llvm::ubi {
+
+Context::Context(Module &M)
+ : Ctx(M.getContext()), M(M), DL(M.getDataLayout()),
+ TLIImpl(M.getTargetTriple()) {}
+
+Context::~Context() = default;
+
+AnyValue Context::getConstantValueImpl(Constant *C) {
+ if (isa<PoisonValue>(C))
+ return AnyValue::getPoisonValue(*this, C->getType());
+
+ // TODO: Handle ConstantInt vector.
+ if (auto *CI = dyn_cast<ConstantInt>(C))
+ return CI->getValue();
+
+ llvm_unreachable("Unrecognized constant");
+}
+
+const AnyValue &Context::getConstantValue(Constant *C) {
+ auto It = ConstCache.find(C);
+ if (It != ConstCache.end())
+ return It->second;
+
+ return ConstCache.emplace(C, getConstantValueImpl(C)).first->second;
+}
+
+MemoryObject::~MemoryObject() = default;
+MemoryObject::MemoryObject(uint64_t Addr, uint64_t Size, StringRef Name,
+ unsigned AS, MemInitKind InitKind)
+ : Address(Addr), Size(Size), Name(Name), AS(AS),
+ State(InitKind != MemInitKind::Poisoned ? MemoryObjectState::Alive
+ : MemoryObjectState::Dead) {
+ switch (InitKind) {
+ case MemInitKind::Zeroed:
+ Bytes.resize(Size, Byte{0, ByteKind::Concrete});
+ break;
+ case MemInitKind::Uninitialized:
+ Bytes.resize(Size, Byte{0, ByteKind::Undef});
+ break;
+ case MemInitKind::Poisoned:
+ Bytes.resize(Size, Byte{0, ByteKind::Poison});
+ break;
+ }
+}
+
+IntrusiveRefCntPtr<MemoryObject> Context::allocate(uint64_t Size,
+ uint64_t Align,
+ StringRef Name, unsigned AS,
+ MemInitKind InitKind) {
+ if (MaxMem != 0 && SaturatingAdd(UsedMem, Size) >= MaxMem)
+ return nullptr;
+ uint64_t AlignedAddr = alignTo(AllocationBase, Align);
+ auto MemObj =
+ makeIntrusiveRefCnt<MemoryObject>(AlignedAddr, Size, Name, AS, InitKind);
+ MemoryObjects[AlignedAddr] = MemObj;
+ AllocationBase = AlignedAddr + Size;
+ UsedMem += Size;
+ return MemObj;
+}
+
+bool Context::free(uint64_t Address) {
+ auto It = MemoryObjects.find(Address);
+ if (It == MemoryObjects.end())
+ return false;
+ UsedMem -= It->second->getSize();
+ It->second->markAsFreed();
+ MemoryObjects.erase(It);
+ return true;
+}
+
+Pointer Context::deriveFromMemoryObject(IntrusiveRefCntPtr<MemoryObject> Obj) {
+ assert(Obj && "Cannot determine the address space of a null memory object");
+ return Pointer(
+ Obj,
+ APInt(DL.getPointerSizeInBits(Obj->getAddressSpace()), Obj->getAddress()),
+ /*Offset=*/0);
+}
+
+void MemoryObject::markAsFreed() {
+ State = MemoryObjectState::Freed;
+ Bytes.clear();
+}
+
+void MemoryObject::writeRawBytes(uint64_t Offset, const void *Data,
+ uint64_t Length) {
+ assert(SaturatingAdd(Offset, Length) <= Size && "Write out of bounds");
+ const uint8_t *ByteData = static_cast<const uint8_t *>(Data);
+ for (uint64_t I = 0; I < Length; ++I)
+ Bytes[Offset + I].set(ByteData[I]);
+}
+
+void MemoryObject::writeInteger(uint64_t Offset, const APInt &Int,
+ const DataLayout &DL) {
+ uint64_t BitWidth = Int.getBitWidth();
+ uint64_t IntSize = divideCeil(BitWidth, 8);
+ assert(SaturatingAdd(Offset, IntSize) <= Size && "Write out of bounds");
+ for (uint64_t I = 0; I < IntSize; ++I) {
+ uint64_t ByteIndex = DL.isLittleEndian() ? I : (IntSize - 1 - I);
+ uint64_t Bits = std::min(BitWidth - ByteIndex * 8, uint64_t(8));
+ Bytes[Offset + I].set(Int.extractBitsAsZExtValue(Bits, ByteIndex * 8));
+ }
+}
+void MemoryObject::writeFloat(uint64_t Offset, const APFloat &Float,
+ const DataLayout &DL) {
+ writeInteger(Offset, Float.bitcastToAPInt(), DL);
+}
+void MemoryObject::writePointer(uint64_t Offset, const Pointer &Ptr,
+ const DataLayout &DL) {
+ writeInteger(Offset, Ptr.address(), DL);
+ // TODO: provenance
+}
+
+} // namespace llvm::ubi
diff --git a/llvm/tools/llubi/lib/Context.h b/llvm/tools/llubi/lib/Context.h
new file mode 100644
index 0000000000000..a0153752b4404
--- /dev/null
+++ b/llvm/tools/llubi/lib/Context.h
@@ -0,0 +1,185 @@
+//===--- Context.h - State Tracking for llubi -------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TOOLS_LLUBI_CONTEXT_H
+#define LLVM_TOOLS_LLUBI_CONTEXT_H
+
+#include "Value.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Analysis/TargetLibraryInfo.h"
+#include "llvm/IR/Module.h"
+#include <map>
+
+namespace llvm::ubi {
+
+enum class MemInitKind {
+ Zeroed,
+ Uninitialized,
+ Poisoned,
+};
+
+enum class MemoryObjectState {
+ // This memory object is accessible.
+ // Valid transitions:
+ // -> Dead (after the end of lifetime of an alloca)
+ // -> Freed (after free is called on a heap object)
+ Alive,
+ // This memory object is out of lifetime. It is OK to perform
+ // operations that do not access its content, e.g., getelementptr.
+ // Otherwise, an immediate UB occurs.
+ // Valid transition:
+ // -> Alive (after the start of lifetime of an alloca)
+ Dead,
+ // This heap memory object has been freed. Any access to it
+ // causes immediate UB. Like dead objects, it is still possible to
+ // perform operations that do not access its content.
+ Freed,
+};
+
+class MemoryObject : public RefCountedBase<MemoryObject> {
+ uint64_t Address;
+ uint64_t Size;
+ SmallVector<Byte, 8> Bytes;
+ StringRef Name;
+ unsigned AS;
+
+ MemoryObjectState State;
+ bool IsConstant = false;
+
+public:
+ MemoryObject(uint64_t Addr, uint64_t Size, StringRef Name, unsigned AS,
+ MemInitKind InitKind);
+ MemoryObject(const MemoryObject &) = delete;
+ MemoryObject(MemoryObject &&) = delete;
+ MemoryObject &operator=(const MemoryObject &) = delete;
+ MemoryObject &operator=(MemoryObject &&) = delete;
+ ~MemoryObject();
+
+ uint64_t getAddress() const { return Address; }
+ uint64_t getSize() const { return Size; }
+ StringRef getName() const { return Name; }
+ unsigned getAddressSpace() const { return AS; }
+ MemoryObjectState getState() const { return State; }
+ bool isConstant() const { return IsConstant; }
+ void setIsConstant(bool C) { IsConstant = C; }
+
+ Byte &operator[](uint64_t Offset) {
+ assert(Offset < Size && "Offset out of bounds");
+ return Bytes[Offset];
+ }
+ void writeRawBytes(uint64_t Offset, const void *Data, uint64_t Length);
+ void writeInteger(uint64_t Offset, const APInt &Int, const DataLayout &DL);
+ void writeFloat(uint64_t Offset, const APFloat &Float, const DataLayout &DL);
+ void writePointer(uint64_t Offset, const Pointer &Ptr, const DataLayout &DL);
+
+ void markAsFreed();
+};
+
+/// An interface for handling events and managing outputs during interpretation.
+/// If the handler returns false from any of the methods, the interpreter will
+/// stop execution immediately.
+class EventHandler {
+public:
+ virtual ~EventHandler() = default;
+
+ virtual bool onInstructionExecuted(Instruction &I, const AnyValue &Result) {
+ return true;
+ }
+ virtual void onUnrecognizedInstruction(Instruction &I) {}
+ virtual void onImmediateUB(StringRef Msg) {}
+ virtual bool onBBJump(Instruction &I, BasicBlock &To) { return true; }
+ virtual bool onFunctionEntry(Function &F, ArrayRef<AnyValue> Args,
+ CallBase *CallSite) {
+ return true;
+ }
+ virtual bool onFunctionExit(Function &F, const AnyValue &RetVal) {
+ return true;
+ }
+ virtual bool onPrint(StringRef Msg) {
+ outs() << Msg;
+ return true;
+ }
+};
+
+/// The global context for the interpreter.
+/// It tracks global state such as heap memory objects and floating point
+/// environment.
+class Context {
+ // Module
+ LLVMContext &Ctx;
+ Module &M;
+ const DataLayout &DL;
+ const TargetLibraryInfoImpl TLIImpl;
+
+ // Configuration
+ uint64_t MaxMem = 0;
+ uint32_t VScale = 4;
+ uint32_t MaxSteps = 0;
+ uint32_t MaxStackDepth = 256;
+
+ // Memory
+ uint64_t UsedMem = 0;
+ // The addresses of memory objects are monotonically increasing.
+ // For now we don't model the behavior of address reuse, which is common
+ // with stack coloring.
+ uint64_t AllocationBase = 8;
+ std::map<uint64_t, IntrusiveRefCntPtr<MemoryObject>> MemoryObjects;
+
+ // Constants
+ // Use std::map to avoid iterator/reference invalidation.
+ std::map<Constant *, AnyValue> ConstCache;
+ AnyValue getConstantValueImpl(Constant *C);
+
+ // TODO: errno and fpenv
+
+public:
+ explicit Context(Module &M);
+ Context(const Context &) = delete;
+ Context(Context &&) = delete;
+ Context &operator=(const Context &) = delete;
+ Context &operator=(Context &&) = delete;
+ ~Context();
+
+ void setMemoryLimit(uint64_t Max) { MaxMem = Max; }
+ void setVScale(uint32_t VS) { VScale = VS; }
+ void setMaxSteps(uint32_t MS) { MaxSteps = MS; }
+ void setMaxStackDepth(uint32_t Depth) { MaxStackDepth = Depth; }
+ uint64_t getMemoryLimit() const { return MaxMem; }
+ uint32_t getVScale() const { return VScale; }
+ uint32_t getMaxSteps() const { return MaxSteps; }
+ uint32_t getMaxStackDepth() const { return MaxStackDepth; }
+
+ LLVMContext &getContext() const { return Ctx; }
+ const DataLayout &getDataLayout() const { return DL; }
+ const TargetLibraryInfoImpl &getTLIImpl() const { return TLIImpl; }
+ uint32_t getEVL(ElementCount EC) const {
+ if (EC.isScalable())
+ return VScale * EC.getKnownMinValue();
+ return EC.getFixedValue();
+ }
+
+ const AnyValue &getConstantValue(Constant *C);
+ IntrusiveRefCntPtr<MemoryObject> allocate(uint64_t Size, uint64_t Align,
+ StringRef Name, unsigned AS,
+ MemInitKind InitKind);
+ bool free(uint64_t Address);
+ // Derive a pointer from a memory object with offset 0.
+ // Please use Pointer's interface for further manipulations.
+ Pointer deriveFromMemoryObject(IntrusiveRefCntPtr<MemoryObject> Obj);
+
+ /// Execute the function \p F with arguments \p Args, and store the return
+ /// value in \p RetVal if the function is not void.
+ /// Returns true if the function executed successfully. False indicates an
+ /// error occurred during execution.
+ bool runFunction(Function &F, ArrayRef<AnyValue> Args, AnyValue &RetVal,
+ EventHandler &Handler);
+};
+
+} // namespace llvm::ubi
+
+#endif
diff --git a/llvm/tools/llubi/lib/Interpreter.cpp b/llvm/tools/llubi/lib/Interpreter.cpp
new file mode 100644
index 0000000000000..aaad8fb15262e
--- /dev/null
+++ b/llvm/tools/llubi/lib/Interpreter.cpp
@@ -0,0 +1,202 @@
+//===- Interpreter.cpp - Interpreter Loop for llubi -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the evaluation loop for each kind of instruction.
+//
+//===----------------------------------------------------------------------===//
+
+#include "Context.h"
+#include "Value.h"
+#include "llvm/IR/InstVisitor.h"
+#include "llvm/Support/Allocator.h"
+
+namespace llvm::ubi {
+
+enum class FrameState {
+ // It is about to enter the function.
+ // Valid transition:
+ // -> Running
+ Entry,
+ // It is executing instructions inside the function.
+ // Valid transitions:
+ // -> Pending (on call)
+ // -> Exit (on return)
+ Running,
+ // It is about to enter a callee or handle return value from the callee.
+ // Valid transitions:
+ // -> Running (after returning from callee)
+ Pending,
+ // It is about to return the control to the caller.
+ Exit,
+};
+
+/// Context for a function call.
+/// This struct maintains the state during the execution of a function,
+/// including the control flow, values of executed instructions, and stack
+/// objects.
+struct Frame {
+ Function &Func;
+ Frame *LastFrame;
+ CallBase *CallSite;
+ ArrayRef<AnyValue> Args;
+ AnyValue &RetVal;
+
+ TargetLibraryInfo TLI;
+ BasicBlock *BB;
+ BasicBlock::iterator PC;
+ FrameState State = FrameState::Entry;
+ // Stack objects allocated in this frame. They will be automatically freed
+ // when the function returns.
+ SmallVector<IntrusiveRefCntPtr<MemoryObject>> Allocas;
+ // Values of arguments and executed instructions in this function.
+ DenseMap<Value *, AnyValue> ValueMap;
+
+ // Reserved for in-flight subroutines.
+ SmallVector<AnyValue> CalleeArgs;
+ AnyValue CalleeRetVal;
+
+ Frame(Function &F, CallBase *CallSite, Frame *LastFrame,
+ ArrayRef<AnyValue> Args, AnyValue &RetVal,
+ const TargetLibraryInfoImpl &TLIImpl)
+ : Func(F), LastFrame(LastFrame), CallSite(CallSite), Args(Args),
+ RetVal(RetVal), TLI(TLIImpl, &F) {
+ assert((Args.size() == F.arg_size() ||
+ (F.isVarArg() && Args.size() >= F.arg_size())) &&
+ "Expected enough arguments to call the function.");
+ BB = &Func.getEntryBlock();
+ PC = BB->begin();
+ for (Argument &Arg : F.args())
+ ValueMap[&Arg] = Args[Arg.getArgNo()];
+ }
+};
+
+/// Instruction executor using the visitor pattern.
+/// visit* methods return true on success, false on error.
+/// Unlike the Context class that manages the global state,
+/// InstExecutor only maintains the state for call frames.
+class InstExecutor : public InstVisitor<InstExecutor, bool> {
+ Context &Ctx;
+ EventHandler &Handler;
+ std::list<Frame> CallStack;
+ // Used to indicate whether the interpreter should continue execution.
+ bool Status;
+ Frame *CurrentFrame = nullptr;
+ AnyValue None;
+
+ void reportImmediateUB(StringRef Msg) {
+ // Check if we have already reported an immediate UB.
+ if (!Status)
+ return;
+ Status = false;
+ // TODO: Provide stack trace information.
+ Handler.onImmediateUB(Msg);
+ }
+
+ const AnyValue &getValue(Value *V) {
+ if (auto *C = dyn_cast<Constant>(V))
+ return Ctx.getConstantValue(C);
+ return CurrentFrame->ValueMap.at(V);
+ }
+
+public:
+ InstExecutor(Context &C, EventHandler &H, Function &F,
+ ArrayRef<AnyValue> Args, AnyValue &RetVal)
+ : Ctx(C), Handler(H), Status(true) {
+ CallStack.emplace_back(F, /*CallSite=*/nullptr, /*LastFrame=*/nullptr, Args,
+ RetVal, Ctx.getTLIImpl());
+ }
+ bool visitReturnInst(ReturnInst &RI) {
+ if (auto *RV = RI.getReturnValue())
+ CurrentFrame->RetVal = getValue(RV);
+ CurrentFrame->State = FrameState::Exit;
+ return Handler.onInstructionExecuted(RI, None);
+ }
+ bool visitInstruction(Instruction &I) {
+ Handler.onUnrecognizedInstruction(I);
+ return false;
+ }
+
+ /// This function implements the main interpreter loop.
+ /// It handles function calls in a non-recursive manner to avoid stack
+ /// overflows.
+ bool runMainLoop() {
+ uint32_t MaxSteps = Ctx.getMaxSteps();
+ uint32_t Steps = 0;
+ while (Status && !CallStack.empty()) {
+ Frame &Top = CallStack.back();
+ CurrentFrame = &Top;
+ if (Top.State == FrameState::Entry) {
+ Handler.onFunctionEntry(Top.Func, Top.Args, Top.CallSite);
+ // TODO: Handle arg attributes
+ } else {
+ assert(Top.State == FrameState::Pending &&
+ "Expected to return from a callee.");
+ }
+
+ Top.State = FrameState::Running;
+ // Interpreter loop inside a function
+ while (Status) {
+ assert(Top.State == FrameState::Running &&
+ "Expected to be in running state.");
+ if (MaxSteps != 0 && Steps >= MaxSteps) {
+ reportImmediateUB("Exceeded maximum number of execution steps.");
+ break;
+ }
+ ++Steps;
+
+ Instruction &I = *Top.PC;
+ if (!visit(&I)) {
+ Status = false;
+ break;
+ }
+ if (!Status)
+ break;
+
+ if (Top.State != FrameState::Pending && !I.isTerminator()) {
+ if (I.getType()->isVoidTy())
+ Handler.onInstructionExecuted(I, None);
+ else
+ Handler.onInstructionExecuted(I, Top.ValueMap.at(&I));
+ }
+
+ // A function call or return has occurred.
+ // We need to exit the inner loop and switch to a
diff erent frame.
+ if (Top.State != FrameState::Running)
+ break;
+
+ // Otherwise, move to the next instruction if it is not a terminator.
+ // For terminators, the PC is updated in the visit* method.
+ if (!I.isTerminator())
+ ++Top.PC;
+ }
+
+ if (!Status)
+ break;
+
+ if (Top.State == FrameState::Exit) {
+ assert((Top.Func.getReturnType()->isVoidTy() || !Top.RetVal.isNone()) &&
+ "Expected return value to be set on function exit.");
+ // TODO:Handle retval attributes
+ Handler.onFunctionExit(Top.Func, Top.RetVal);
+ CallStack.pop_back();
+ } else {
+ assert(Top.State == FrameState::Pending &&
+ "Expected to enter a callee.");
+ }
+ }
+ return Status;
+ }
+};
+
+bool Context::runFunction(Function &F, ArrayRef<AnyValue> Args,
+ AnyValue &RetVal, EventHandler &Handler) {
+ InstExecutor Executor(*this, Handler, F, Args, RetVal);
+ return Executor.runMainLoop();
+}
+
+} // namespace llvm::ubi
diff --git a/llvm/tools/llubi/lib/Value.cpp b/llvm/tools/llubi/lib/Value.cpp
new file mode 100644
index 0000000000000..57cd94ef0f7bb
--- /dev/null
+++ b/llvm/tools/llubi/lib/Value.cpp
@@ -0,0 +1,230 @@
+//===- Value.cpp - Value Representation for llubi -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements utility functions for the value representation.
+//
+//===----------------------------------------------------------------------===//
+
+#include "Value.h"
+#include "Context.h"
+#include "llvm/ADT/SmallString.h"
+
+namespace llvm::ubi {
+
+void Pointer::print(raw_ostream &OS) const {
+ SmallString<32> AddrStr;
+ Address.toStringUnsigned(AddrStr, 16);
+ OS << "ptr 0x" << AddrStr << " [";
+ if (Obj) {
+ OS << Obj->getName();
+ if (Offset)
+ OS << " + " << Offset;
+ } else {
+ OS << "dangling";
+ }
+ OS << "]";
+}
+
+AnyValue Pointer::null(unsigned BitWidth) {
+ return AnyValue(Pointer(nullptr, APInt::getZero(BitWidth), 0));
+}
+
+void AnyValue::print(raw_ostream &OS) const {
+ switch (Kind) {
+ case StorageKind::Integer:
+ if (IntVal.getBitWidth() == 1) {
+ OS << (IntVal.getBoolValue() ? "T" : "F");
+ break;
+ }
+ OS << "i" << IntVal.getBitWidth() << ' ' << IntVal;
+ break;
+ case StorageKind::Float:
+ OS << FloatVal;
+ break;
+ case StorageKind::Pointer:
+ PtrVal.print(OS);
+ break;
+ case StorageKind::Poison:
+ OS << "poison";
+ break;
+ case StorageKind::None:
+ OS << "none";
+ break;
+ case StorageKind::Aggregate:
+ OS << "{ ";
+ for (size_t I = 0, E = AggVal.size(); I != E; ++I) {
+ if (I != 0)
+ OS << ", ";
+ AggVal[I].print(OS);
+ }
+ OS << " }";
+ break;
+ }
+}
+
+void AnyValue::destroy() {
+ switch (Kind) {
+ case StorageKind::Integer:
+ IntVal.~APInt();
+ break;
+ case StorageKind::Float:
+ FloatVal.~APFloat();
+ break;
+ case StorageKind::Pointer:
+ PtrVal.~Pointer();
+ break;
+ case StorageKind::Poison:
+ case StorageKind::None:
+ break;
+ case StorageKind::Aggregate:
+ AggVal.~vector();
+ break;
+ }
+}
+
+AnyValue::AnyValue(const AnyValue &Other) : Kind(Other.Kind) {
+ switch (Other.Kind) {
+ case StorageKind::Integer:
+ new (&IntVal) APInt(Other.IntVal);
+ break;
+ case StorageKind::Float:
+ new (&FloatVal) APFloat(Other.FloatVal);
+ break;
+ case StorageKind::Pointer:
+ new (&PtrVal) Pointer(Other.PtrVal);
+ break;
+ case StorageKind::Poison:
+ case StorageKind::None:
+ break;
+ case StorageKind::Aggregate:
+ new (&AggVal) std::vector<AnyValue>(Other.AggVal);
+ break;
+ }
+}
+AnyValue::AnyValue(AnyValue &&Other) : Kind(Other.Kind) {
+ switch (Other.Kind) {
+ case StorageKind::Integer:
+ new (&IntVal) APInt(std::move(Other.IntVal));
+ break;
+ case StorageKind::Float:
+ new (&FloatVal) APFloat(std::move(Other.FloatVal));
+ break;
+ case StorageKind::Pointer:
+ new (&PtrVal) Pointer(std::move(Other.PtrVal));
+ break;
+ case StorageKind::Poison:
+ case StorageKind::None:
+ break;
+ case StorageKind::Aggregate:
+ new (&AggVal) std::vector<AnyValue>(std::move(Other.AggVal));
+ break;
+ }
+}
+
+AnyValue &AnyValue::operator=(const AnyValue &Other) {
+ if (&Other == this)
+ return *this;
+
+ destroy();
+ Kind = Other.Kind;
+ switch (Other.Kind) {
+ case StorageKind::Integer:
+ new (&IntVal) APInt(Other.IntVal);
+ break;
+ case StorageKind::Float:
+ new (&FloatVal) APFloat(Other.FloatVal);
+ break;
+ case StorageKind::Pointer:
+ new (&PtrVal) Pointer(Other.PtrVal);
+ break;
+ case StorageKind::Poison:
+ case StorageKind::None:
+ break;
+ case StorageKind::Aggregate:
+ new (&AggVal) std::vector<AnyValue>(Other.AggVal);
+ break;
+ }
+
+ return *this;
+}
+AnyValue &AnyValue::operator=(AnyValue &&Other) {
+ if (&Other == this)
+ return *this;
+ destroy();
+ Kind = Other.Kind;
+ switch (Other.Kind) {
+ case StorageKind::Integer:
+ new (&IntVal) APInt(std::move(Other.IntVal));
+ break;
+ case StorageKind::Float:
+ new (&FloatVal) APFloat(std::move(Other.FloatVal));
+ break;
+ case StorageKind::Pointer:
+ new (&PtrVal) Pointer(std::move(Other.PtrVal));
+ break;
+ case StorageKind::Poison:
+ case StorageKind::None:
+ break;
+ case StorageKind::Aggregate:
+ new (&AggVal) std::vector<AnyValue>(std::move(Other.AggVal));
+ break;
+ }
+
+ return *this;
+}
+
+AnyValue AnyValue::getPoisonValue(Context &Ctx, Type *Ty) {
+ if (Ty->isFloatingPointTy() || Ty->isIntegerTy() || Ty->isPointerTy())
+ return AnyValue::poison();
+ if (auto *VecTy = dyn_cast<VectorType>(Ty)) {
+ uint32_t NumElements = Ctx.getEVL(VecTy->getElementCount());
+ return AnyValue(std::vector<AnyValue>(NumElements, AnyValue::poison()));
+ }
+ if (auto *ArrTy = dyn_cast<ArrayType>(Ty)) {
+ uint64_t NumElements = ArrTy->getNumElements();
+ return AnyValue(std::vector<AnyValue>(
+ NumElements, getPoisonValue(Ctx, ArrTy->getElementType())));
+ }
+ if (auto *StructTy = dyn_cast<StructType>(Ty)) {
+ std::vector<AnyValue> Elements;
+ Elements.reserve(StructTy->getNumElements());
+ for (uint32_t I = 0, E = StructTy->getNumElements(); I != E; ++I)
+ Elements.push_back(getPoisonValue(Ctx, StructTy->getElementType(I)));
+ return AnyValue(std::move(Elements));
+ }
+ llvm_unreachable("Unsupported type");
+}
+AnyValue AnyValue::getNullValue(Context &Ctx, Type *Ty) {
+ if (Ty->isIntegerTy())
+ return AnyValue(APInt::getZero(Ty->getIntegerBitWidth()));
+ if (Ty->isFloatingPointTy())
+ return AnyValue(APFloat::getZero(Ty->getFltSemantics()));
+ if (Ty->isPointerTy())
+ return Pointer::null(
+ Ctx.getDataLayout().getPointerSizeInBits(Ty->getPointerAddressSpace()));
+ if (auto *VecTy = dyn_cast<VectorType>(Ty)) {
+ uint32_t NumElements = Ctx.getEVL(VecTy->getElementCount());
+ return AnyValue(std::vector<AnyValue>(
+ NumElements, getNullValue(Ctx, VecTy->getElementType())));
+ }
+ if (auto *ArrTy = dyn_cast<ArrayType>(Ty)) {
+ uint64_t NumElements = ArrTy->getNumElements();
+ return AnyValue(std::vector<AnyValue>(
+ NumElements, getNullValue(Ctx, ArrTy->getElementType())));
+ }
+ if (auto *StructTy = dyn_cast<StructType>(Ty)) {
+ std::vector<AnyValue> Elements;
+ Elements.reserve(StructTy->getNumElements());
+ for (uint32_t I = 0, E = StructTy->getNumElements(); I != E; ++I)
+ Elements.push_back(getNullValue(Ctx, StructTy->getElementType(I)));
+ return AnyValue(std::move(Elements));
+ }
+ llvm_unreachable("Unsupported type");
+}
+
+} // namespace llvm::ubi
diff --git a/llvm/tools/llubi/lib/Value.h b/llvm/tools/llubi/lib/Value.h
new file mode 100644
index 0000000000000..0828941538798
--- /dev/null
+++ b/llvm/tools/llubi/lib/Value.h
@@ -0,0 +1,152 @@
+//===--- Value.h - Value Representation for llubi ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TOOLS_LLUBI_VALUE_H
+#define LLVM_TOOLS_LLUBI_VALUE_H
+
+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/IntrusiveRefCntPtr.h"
+#include "llvm/IR/Type.h"
+#include "llvm/Support/raw_ostream.h"
+
+namespace llvm::ubi {
+
+class MemoryObject;
+class Context;
+class AnyValue;
+
+enum class ByteKind : uint8_t {
+ // A concrete byte with a known value.
+ Concrete,
+ // A uninitialized byte. Each load from an uninitialized byte yields
+ // a nondeterministic value.
+ Undef,
+ // A poisoned byte. It occurs when the program stores a poison value to
+ // memory,
+ // or when a memory object is dead.
+ Poison,
+};
+
+struct Byte {
+ uint8_t Value;
+ ByteKind Kind : 2;
+ // TODO: provenance
+
+ void set(uint8_t V) {
+ Value = V;
+ Kind = ByteKind::Concrete;
+ }
+};
+
+// TODO: Byte
+enum class StorageKind {
+ Integer,
+ Float,
+ Pointer,
+ Poison,
+ None, // Placeholder for void type
+ Aggregate, // Struct, Array or Vector
+};
+
+class Pointer {
+ // The underlying memory object. It can be null for invalid or dangling
+ // pointers.
+ IntrusiveRefCntPtr<MemoryObject> Obj;
+ // The address of the pointer. The bit width is determined by
+ // DataLayout::getPointerSizeInBits.
+ APInt Address;
+ // The offset within the memory object.
+ uint64_t Offset;
+ // TODO: modeling inrange(Start, End) attribute
+
+public:
+ explicit Pointer(IntrusiveRefCntPtr<MemoryObject> Obj, const APInt &Address,
+ uint64_t Offset)
+ : Obj(std::move(Obj)), Address(Address), Offset(Offset) {}
+ static AnyValue null(unsigned BitWidth);
+ void print(raw_ostream &OS) const;
+ const APInt &address() const { return Address; }
+ MemoryObject *getMemoryObject() const { return Obj.get(); }
+};
+
+// Value representation for actual values of LLVM values.
+// We don't model undef values here (except for byte types).
+class [[nodiscard]] AnyValue {
+ StorageKind Kind;
+ union {
+ APInt IntVal;
+ APFloat FloatVal;
+ Pointer PtrVal;
+ std::vector<AnyValue> AggVal;
+ };
+
+ struct PoisonTag {};
+ void destroy();
+
+public:
+ AnyValue() : Kind(StorageKind::None) {}
+ explicit AnyValue(PoisonTag) : Kind(StorageKind::Poison) {}
+ AnyValue(APInt Val) : Kind(StorageKind::Integer), IntVal(std::move(Val)) {}
+ AnyValue(APFloat Val) : Kind(StorageKind::Float), FloatVal(std::move(Val)) {}
+ AnyValue(Pointer Val) : Kind(StorageKind::Pointer), PtrVal(std::move(Val)) {}
+ AnyValue(std::vector<AnyValue> Val)
+ : Kind(StorageKind::Aggregate), AggVal(std::move(Val)) {}
+ AnyValue(const AnyValue &Other);
+ AnyValue(AnyValue &&Other);
+ AnyValue &operator=(const AnyValue &);
+ AnyValue &operator=(AnyValue &&);
+ ~AnyValue() { destroy(); }
+
+ void print(raw_ostream &OS) const;
+
+ static AnyValue poison() { return AnyValue(PoisonTag{}); }
+ static AnyValue getPoisonValue(Context &Ctx, Type *Ty);
+ static AnyValue getNullValue(Context &Ctx, Type *Ty);
+
+ bool isNone() const { return Kind == StorageKind::None; }
+ bool isPoison() const { return Kind == StorageKind::Poison; }
+
+ const APInt &asInteger() const {
+ assert(Kind == StorageKind::Integer && "Expect an integer value");
+ return IntVal;
+ }
+
+ const APFloat &asFloat() const {
+ assert(Kind == StorageKind::Float && "Expect a float value");
+ return FloatVal;
+ }
+
+ const Pointer &asPointer() const {
+ assert(Kind == StorageKind::Pointer && "Expect a pointer value");
+ return PtrVal;
+ }
+
+ const std::vector<AnyValue> &asAggregate() const {
+ assert(Kind == StorageKind::Aggregate &&
+ "Expect an aggregate/vector value");
+ return AggVal;
+ }
+
+ // Helper function for C++ 17 structured bindings.
+ template <size_t I> const AnyValue &get() const {
+ assert(Kind == StorageKind::Aggregate &&
+ "Expect an aggregate/vector value");
+ assert(I < AggVal.size() && "Index out of bounds");
+ return AggVal[I];
+ }
+};
+
+inline raw_ostream &operator<<(raw_ostream &OS, const AnyValue &V) {
+ V.print(OS);
+ return OS;
+}
+
+} // namespace llvm::ubi
+
+#endif
diff --git a/llvm/tools/llubi/llubi.cpp b/llvm/tools/llubi/llubi.cpp
new file mode 100644
index 0000000000000..67ab01eca89fe
--- /dev/null
+++ b/llvm/tools/llubi/llubi.cpp
@@ -0,0 +1,239 @@
+//===------------- llubi.cpp - LLVM UB-aware Interpreter --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This utility provides an UB-aware interpreter for programs in LLVM bitcode.
+// It is not built on top of the existing ExecutionEngine interface, but instead
+// implements its own value representation, state tracking and interpreter loop.
+//
+//===----------------------------------------------------------------------===//
+
+#include "lib/Context.h"
+#include "llvm/Config/llvm-config.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IRReader/IRReader.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Format.h"
+#include "llvm/Support/InitLLVM.h"
+#include "llvm/Support/MathExtras.h"
+#include "llvm/Support/SourceMgr.h"
+#include "llvm/Support/WithColor.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+
+static cl::opt<std::string> InputFile(cl::desc("<input bitcode>"),
+ cl::Positional, cl::init("-"));
+
+static cl::list<std::string> InputArgv(cl::ConsumeAfter,
+ cl::desc("<program arguments>..."));
+
+static cl::opt<std::string>
+ EntryFunc("entry-function",
+ cl::desc("Specify the entry function (default = 'main') "
+ "of the executable"),
+ cl::value_desc("function"), cl::init("main"));
+
+static cl::opt<std::string>
+ FakeArgv0("fake-argv0",
+ cl::desc("Override the 'argv[0]' value passed into the executing"
+ " program"),
+ cl::value_desc("executable"));
+
+static cl::opt<bool>
+ Verbose("verbose", cl::desc("Print results for each instruction executed."),
+ cl::init(false));
+
+cl::OptionCategory InterpreterCategory("Interpreter Options");
+
+static cl::opt<unsigned> MaxMem(
+ "max-mem",
+ cl::desc("Max amount of memory (in bytes) that can be allocated by the"
+ " program, including stack, heap, and global variables."
+ " Set to 0 to disable the limit."),
+ cl::value_desc("N"), cl::init(0), cl::cat(InterpreterCategory));
+
+static cl::opt<unsigned>
+ MaxSteps("max-steps",
+ cl::desc("Max number of instructions executed."
+ " Set to 0 to disable the limit."),
+ cl::value_desc("N"), cl::init(0), cl::cat(InterpreterCategory));
+
+static cl::opt<unsigned> MaxStackDepth(
+ "max-stack-depth",
+ cl::desc("Max stack depth (default = 256). Set to 0 to disable the limit."),
+ cl::value_desc("N"), cl::init(256), cl::cat(InterpreterCategory));
+
+static cl::opt<unsigned>
+ VScale("vscale", cl::desc("The value of llvm.vscale (default = 4)"),
+ cl::value_desc("N"), cl::init(4), cl::cat(InterpreterCategory));
+
+class VerboseEventHandler : public ubi::EventHandler {
+public:
+ bool onInstructionExecuted(Instruction &I,
+ const ubi::AnyValue &Result) override {
+ if (Result.isNone()) {
+ errs() << I << '\n';
+ } else {
+ errs() << I << " => " << Result << '\n';
+ }
+
+ return true;
+ }
+
+ void onImmediateUB(StringRef Msg) override {
+ errs() << "Immediate UB detected: " << Msg << '\n';
+ }
+
+ bool onBBJump(Instruction &I, BasicBlock &To) override {
+ errs() << I << " jump to ";
+ To.printAsOperand(errs(), /*PrintType=*/false);
+ return true;
+ }
+
+ bool onFunctionEntry(Function &F, ArrayRef<ubi::AnyValue> Args,
+ CallBase *CallSite) override {
+ errs() << "Entering function: " << F.getName() << '\n';
+ size_t ArgSize = F.arg_size();
+ for (auto &&[Idx, Arg] : enumerate(Args)) {
+ if (Idx >= ArgSize)
+ errs() << " vaarg[" << (Idx - ArgSize) << "] = " << Arg << '\n';
+ else
+ errs() << " " << *F.getArg(Idx) << " = " << Arg << '\n';
+ }
+ return true;
+ }
+
+ bool onFunctionExit(Function &F, const ubi::AnyValue &RetVal) override {
+ errs() << "Exiting function: " << F.getName() << '\n';
+ return true;
+ }
+
+ void onUnrecognizedInstruction(Instruction &I) override {
+ errs() << "Unrecognized instruction: " << I << '\n';
+ }
+};
+
+int main(int argc, char **argv) {
+ InitLLVM X(argc, argv);
+
+ cl::ParseCommandLineOptions(argc, argv, "llvm ub-aware interpreter\n");
+
+ if (EntryFunc.empty()) {
+ WithColor::error() << "--entry-function name cannot be empty\n";
+ return 1;
+ }
+
+ LLVMContext Context;
+
+ // Load the bitcode...
+ SMDiagnostic Err;
+ std::unique_ptr<Module> Owner = parseIRFile(InputFile, Err, Context);
+ Module *Mod = Owner.get();
+ if (!Mod) {
+ Err.print(argv[0], errs());
+ return 1;
+ }
+
+ // If the user specifically requested an argv[0] to pass into the program,
+ // do it now.
+ if (!FakeArgv0.empty()) {
+ InputFile = static_cast<std::string>(FakeArgv0);
+ } else {
+ // Otherwise, if there is a .bc suffix on the executable strip it off, it
+ // might confuse the program.
+ if (StringRef(InputFile).ends_with(".bc"))
+ InputFile.erase(InputFile.length() - 3);
+ }
+
+ // Add the module's name to the start of the vector of arguments to main().
+ InputArgv.insert(InputArgv.begin(), InputFile);
+
+ // Initialize the execution context and set parameters.
+ ubi::Context Ctx(*Mod);
+ Ctx.setMemoryLimit(MaxMem);
+ Ctx.setVScale(VScale);
+ Ctx.setMaxSteps(MaxSteps);
+ Ctx.setMaxStackDepth(MaxStackDepth);
+
+ // Call the main function from M as if its signature were:
+ // int main (int argc, char **argv)
+ // using the contents of Args to determine argc & argv
+ Function *EntryFn = Mod->getFunction(EntryFunc);
+ if (!EntryFn) {
+ WithColor::error() << '\'' << EntryFunc
+ << "\' function not found in module.\n";
+ return 1;
+ }
+ TargetLibraryInfo TLI(Ctx.getTLIImpl());
+ Type *IntTy = IntegerType::get(Ctx.getContext(), TLI.getIntSize());
+ auto *MainFuncTy = FunctionType::get(
+ IntTy, {IntTy, PointerType::getUnqual(Ctx.getContext())}, false);
+ SmallVector<ubi::AnyValue> Args;
+ if (EntryFn->getFunctionType() == MainFuncTy) {
+ Args.push_back(
+ Ctx.getConstantValue(ConstantInt::get(IntTy, InputArgv.size())));
+
+ uint32_t PtrSize = Ctx.getDataLayout().getPointerSize();
+ uint64_t PtrsSize = PtrSize * (InputArgv.size() + 1);
+ auto ArgvPtrsMem = Ctx.allocate(PtrsSize, 8, "argv",
+ /*AS=*/0, ubi::MemInitKind::Zeroed);
+ if (!ArgvPtrsMem) {
+ WithColor::error() << "Failed to allocate memory for argv pointers.\n";
+ return 1;
+ }
+ for (const auto &[Idx, Arg] : enumerate(InputArgv)) {
+ uint64_t Size = Arg.length() + 1;
+ auto ArgvStrMem = Ctx.allocate(Size, 8, "argv_str",
+ /*AS=*/0, ubi::MemInitKind::Zeroed);
+ if (!ArgvStrMem) {
+ WithColor::error() << "Failed to allocate memory for argv strings.\n";
+ return 1;
+ }
+ ubi::Pointer ArgPtr = Ctx.deriveFromMemoryObject(ArgvStrMem);
+ ArgvStrMem->writeRawBytes(0, Arg.c_str(), Arg.length());
+ ArgvPtrsMem->writePointer(Idx * PtrSize, ArgPtr, Ctx.getDataLayout());
+ }
+ Args.push_back(Ctx.deriveFromMemoryObject(ArgvPtrsMem));
+ } else if (!EntryFn->arg_empty()) {
+ // If the signature does not match (e.g., llvm-reduce change the signature
+ // of main), it will pass null values for all arguments.
+ WithColor::warning()
+ << "The signature of function '" << EntryFunc
+ << "' does not match 'int main(int, char**)', passing null values for "
+ "all arguments.\n";
+ Args.reserve(EntryFn->arg_size());
+ for (Argument &Arg : EntryFn->args())
+ Args.push_back(ubi::AnyValue::getNullValue(Ctx, Arg.getType()));
+ }
+
+ ubi::EventHandler NoopHandler;
+ VerboseEventHandler VerboseHandler;
+ ubi::AnyValue RetVal;
+ if (!Ctx.runFunction(*EntryFn, Args, RetVal,
+ Verbose ? VerboseHandler : NoopHandler)) {
+ WithColor::error() << "Execution of function '" << EntryFunc
+ << "' failed.\n";
+ return 1;
+ }
+
+ // If the function returns an integer, return that as the exit code.
+ if (EntryFn->getReturnType()->isIntegerTy()) {
+ assert(!RetVal.isNone() && "Expected a return value from entry function");
+ if (RetVal.isPoison()) {
+ WithColor::error() << "Execution of function '" << EntryFunc
+ << "' resulted in poison return value.\n";
+ return 1;
+ }
+ APInt Result = RetVal.asInteger();
+ return (int)Result.extractBitsAsZExtValue(
+ std::min(Result.getBitWidth(), 8U), 0);
+ }
+ return 0;
+}
More information about the llvm-commits
mailing list