[llvm] [Offload] Update allocations to include device (PR #154733)

Mon Sep 1 03:27:36 PDT 2025

================
@@ -124,8 +125,8 @@ struct OffloadContext {
 
   bool TracingEnabled = false;
   bool ValidationEnabled = true;
-  DenseMap<void *, AllocInfo> AllocInfoMap{};
-  std::mutex AllocInfoMapMutex{};
+  SmallVector<AllocInfo> AllocInfoList{};
----------------
pbalcer wrote:

> UR does require the destination and source device - It needs to know which backend (AMD/CUDA/OpenCL/etc) to dispatch to, which is stored in ur_queue_handle_t. 

Not sure I follow, here's the UR entry point:
```
UR_APIEXPORT ur_result_t UR_APICALL urEnqueueUSMMemcpy(
    ur_queue_handle_t hQueue,
    bool blocking,
    void *pDst,
    const void *pSrc,
    size_t size,
    uint32_t numEventsInWaitList,
    const ur_event_handle_t *phEventWaitList,
    ur_event_handle_t *phEvent);
```

Yes, we need a hQueue to do the dispatch, but it's there because memcpy is logically an operation on the queue.

> Since the queue is optional in olMemcpy, a device to lookup the backend (AMD or Nvidia) is required. And if we need one device, we may as well have the user specify both instead of requiring the backend to support determining whether the copy is h2d/h2h/d2d. 

Given how this lookup is implemented right now, I think we should revisit the decision for queue to be optional. Given the current API, the best we can probably do is something like an interval map lookup (2x for both src and dst) - which is going to be both complex and expensive. Do we really want to pay the cost of doing those lookups for every usm memory operation?

In UR, we have a separate non-queued blocking copy that operates on a context:
```
UR_APIEXPORT ur_result_t UR_APICALL urUSMContextMemcpyExp(
    /// [in] Context associated with the device(s) that own the allocations
    /// `pSrc` and `pDst`.
    ur_context_handle_t hContext,
    void *pDst,
    const void *pSrc,
    size_t size);
```

We could do something similar instead of trying to overload a single function with multiple different functionalities.

https://github.com/llvm/llvm-project/pull/154733