[llvm-branch-commits] [llvm] [libsycl] add single_task (PR #192499)
Kseniya Tikhomirova via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Apr 16 10:49:47 PDT 2026
https://github.com/KseniyaTikhomirova created https://github.com/llvm/llvm-project/pull/192499
Depends on https://github.com/llvm/llvm-project/pull/188794 (stacked PRs).
Depends on liboffload PR: https://github.com/llvm/llvm-project/pull/184343, need to uncomment launch args once it is merged.
Depends on clang driver update to pick the right location for search of libsycl: follow up for https://github.com/llvm/llvm-project/pull/188770. Without it test will fail since clang can't find libsycl in build dir.
This is part of the SYCL support upstreaming effort. The relevant RFCs can be found here:
https://discourse.llvm.org/t/rfc-add-full-support-for-the-sycl-programming-model/74080 https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479
The approach with void sycl_kernel_launch(pack of arguments) implies that
we can use or copy arguments only during that call. Since it pass only
kernel arguments as parameters and returns void - we have to split setting
of extra kernel data like event dependencies and range and getting result
event from arguments handling and direct kernel submision if it is
possible. Key stages: 1) passing to queue (or handler in future) dependency
events and range (for parallel_for), saving them in queue (copy/move). 2)
wrapping kernel arguments into typeless wrappers (pointer based, initially
no copy) and passing to the queue. Then depending on scenario (without host
tasks and accessors we should be able to submit everything directly)
collection of arguments is converted to preferred liboffload structure (no
copy of objects, copy of pointers) and passed to liboffload or RT does deep
copy of provided arguments (simple copy of pointer of USM and copy of value
for other arguments) to keep them alive till kernel enqueue outside parent
submit call. 3) getting event associated with kernel enqueue. Key notes: 1)
Having these 3 separated calls is not the best solution but the only one
allowing to avoid copy for some scenarios (otherwise we have to do deep
copy always and then do joined kernel submission outside sycl_kernel_launch
scope). 2) submit must be thread-safe. Since we have 3 calls we need to
keep kernel params and resulting event in a per queue + per thread/per
kernel way. To achieve this without copy and joined kernel submission queue
(in future - handler) stores thread_local data for kernel submission.
thread_local can't be used for non-static class members so they are static.
Given: same queue can be used from different threads but thread can't use
different queues at the same moment; that means that we actually need per
thread storag
>From 79372977bc5f059e5d7602d6ca1cccc52798b702 Mon Sep 17 00:00:00 2001
From: "Tikhomirova, Kseniya" <kseniya.tikhomirova at intel.com>
Date: Wed, 15 Apr 2026 05:55:42 -0700
Subject: [PATCH] single_task on top of getKernelInfo
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
---
libsycl/docs/index.rst | 4 +
.../__impl/detail/get_device_kernel_info.hpp | 43 +++++++
.../sycl/__impl/detail/unified_range_view.hpp | 51 ++++++++
libsycl/include/sycl/__impl/queue.hpp | 99 ++++++++++++++++
libsycl/src/detail/global_objects.cpp | 2 +
libsycl/src/detail/program_manager.cpp | 17 +++
libsycl/src/detail/program_manager.hpp | 6 +
libsycl/src/detail/queue_impl.cpp | 111 ++++++++++++++++++
libsycl/src/detail/queue_impl.hpp | 39 ++++++
libsycl/src/queue.cpp | 19 +++
10 files changed, 391 insertions(+)
create mode 100644 libsycl/include/sycl/__impl/detail/get_device_kernel_info.hpp
create mode 100644 libsycl/include/sycl/__impl/detail/unified_range_view.hpp
diff --git a/libsycl/docs/index.rst b/libsycl/docs/index.rst
index 9aa36b4a54c57..04691a96a188a 100644
--- a/libsycl/docs/index.rst
+++ b/libsycl/docs/index.rst
@@ -113,6 +113,10 @@ TODO for added SYCL classes
* to implement submit & copy with accessors (low priority)
* get_info & properties
* ctors that accepts context (blocked by lack of liboffload support)
+ * nd_range kernel submissions
+ * cross-context events wait (host tasks are needed)
+ * implement check if lambda arguments are device copyable (requires clang support of corresponding builtins) unless FE will fully cover it
+ * kernel instantiating on host (debugging purposes)
* ``property_list``: to fully implement and integrate with existing SYCL runtime classes supporting it
* usm allocations:
diff --git a/libsycl/include/sycl/__impl/detail/get_device_kernel_info.hpp b/libsycl/include/sycl/__impl/detail/get_device_kernel_info.hpp
new file mode 100644
index 0000000000000..292755037410e
--- /dev/null
+++ b/libsycl/include/sycl/__impl/detail/get_device_kernel_info.hpp
@@ -0,0 +1,43 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains helper function to query kernel info that is uniform
+/// between different submissions of the same kernel.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBSYCL_GET_DEV_INFO
+#define _LIBSYCL_GET_DEV_INFO
+
+#include <sycl/__impl/detail/config.hpp>
+
+#include <string_view>
+
+_LIBSYCL_BEGIN_NAMESPACE_SYCL
+
+namespace detail {
+
+class DeviceKernelInfo;
+// Lifetime of the underlying `DeviceKernelInfo` is tied to the availability of
+// the `sycl_device_binaries` corresponding to this kernel. In other words, once
+// user library is unloaded (see __sycl_unregister_lib), program manager
+// destroys this `DeviceKernelInfo` object and the reference returned from here
+// becomes stale.
+_LIBSYCL_EXPORT DeviceKernelInfo &getDeviceKernelInfo(std::string_view);
+
+template <class KernelName>
+DeviceKernelInfo &getDeviceKernelInfo(std::string_view KernelNameStr) {
+ static DeviceKernelInfo &Info = getDeviceKernelInfo(KernelNameStr);
+ return Info;
+}
+
+} // namespace detail
+_LIBSYCL_END_NAMESPACE_SYCL
+
+#endif // _LIBSYCL_GET_DEV_INFO
diff --git a/libsycl/include/sycl/__impl/detail/unified_range_view.hpp b/libsycl/include/sycl/__impl/detail/unified_range_view.hpp
new file mode 100644
index 0000000000000..4bcaa48eec757
--- /dev/null
+++ b/libsycl/include/sycl/__impl/detail/unified_range_view.hpp
@@ -0,0 +1,51 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains helper function class to unify ABI for different kernel
+/// ranges.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBSYCL___IMPL_DETAIL_UNIFIED_RANGE_VIEW_HPP
+#define _LIBSYCL___IMPL_DETAIL_UNIFIED_RANGE_VIEW_HPP
+
+#include <sycl/__impl/detail/config.hpp>
+
+_LIBSYCL_BEGIN_NAMESPACE_SYCL
+
+namespace detail {
+
+/// The structure to keep dimension and references to ranges unified for
+/// all dimensions.
+struct UnifiedRangeView {
+ /// Default contructed view matches the single task execution range.
+ UnifiedRangeView() = default;
+ UnifiedRangeView(const UnifiedRangeView &Desc) = default;
+ UnifiedRangeView(UnifiedRangeView &&Desc) = default;
+ UnifiedRangeView &operator=(const UnifiedRangeView &Desc) = default;
+ UnifiedRangeView &operator=(UnifiedRangeView &&Desc) = default;
+ ~UnifiedRangeView() = default;
+
+ // TODO: ctors with sycl::range and nd::range will be added later.
+
+ UnifiedRangeView(const size_t *GlobalSize, const size_t *LocalSize,
+ const size_t *Offset, size_t Dims)
+ : MGlobalSize(GlobalSize), MLocalSize(LocalSize), MOffset(Offset),
+ MDims(Dims) {}
+
+ const size_t *MGlobalSize = nullptr;
+ const size_t *MLocalSize = nullptr;
+ const size_t *MOffset = nullptr;
+ size_t MDims = 1;
+};
+} // namespace detail
+
+_LIBSYCL_END_NAMESPACE_SYCL
+
+#endif // _LIBSYCL___IMPL_DETAIL_UNIFIED_RANGE_VIEW_HPP
diff --git a/libsycl/include/sycl/__impl/queue.hpp b/libsycl/include/sycl/__impl/queue.hpp
index 41b018b681b8e..991ebe7f2a666 100644
--- a/libsycl/include/sycl/__impl/queue.hpp
+++ b/libsycl/include/sycl/__impl/queue.hpp
@@ -17,11 +17,14 @@
#include <sycl/__impl/async_handler.hpp>
#include <sycl/__impl/device.hpp>
+#include <sycl/__impl/event.hpp>
#include <sycl/__impl/property_list.hpp>
#include <sycl/__impl/detail/config.hpp>
#include <sycl/__impl/detail/default_async_handler.hpp>
+#include <sycl/__impl/detail/get_device_kernel_info.hpp>
#include <sycl/__impl/detail/obj_utils.hpp>
+#include <sycl/__impl/detail/unified_range_view.hpp>
_LIBSYCL_BEGIN_NAMESPACE_SYCL
@@ -29,6 +32,27 @@ class context;
namespace detail {
class QueueImpl;
+
+template <typename, typename T> struct CheckFunctionSignature {
+ static_assert(std::integral_constant<T, false>::value,
+ "Second template parameter is required to be of function type");
+};
+
+template <typename F, typename RetT, typename... Args>
+struct CheckFunctionSignature<F, RetT(Args...)> {
+private:
+ template <typename T>
+ static constexpr auto check(T *) -> typename std::is_same<
+ decltype(std::declval<T>().operator()(std::declval<Args>()...)),
+ RetT>::type;
+
+ template <typename> static constexpr std::false_type check(...);
+
+ using type = decltype(check<F>(0));
+
+public:
+ static constexpr bool value = type::value;
+};
} // namespace detail
// SYCL 2020 4.6.5. Queue class.
@@ -139,7 +163,82 @@ class _LIBSYCL_EXPORT queue {
/// exceptions.
void wait();
+ /// Defines and invokes a SYCL kernel function as a lambda expression or a
+ /// named function object type.
+ ///
+ /// \param kernelFunc is the kernel functor or lambda.
+ /// \return an event that represents the status of the submitted kernel.
+ template <typename KernelName, typename KernelType>
+ event single_task(const KernelType &kernelFunc) {
+ return single_task<KernelName, KernelType>({}, kernelFunc);
+ }
+
+ /// Defines and invokes a SYCL kernel function as a lambda expression or a
+ /// named function object type.
+ ///
+ /// \param depEvent is an event that specifies the kernel dependency.
+ /// \param kernelFunc is the kernel functor or lambda.
+ /// \return an event that represents the status of the submitted kernel.
+ template <typename KernelName, typename KernelType>
+ event single_task(event depEvent, const KernelType &kernelFunc) {
+ return single_task<KernelName, KernelType>({depEvent}, kernelFunc);
+ }
+
+ /// Defines and invokes a SYCL kernel function as a lambda expression or a
+ /// named function object type.
+ ///
+ /// \param depEvents is a collection of events that specify the kernel
+ /// dependencies.
+ /// \param kernelFunc is the kernel functor or lambda.
+ /// \return an event that represents the status of the submitted kernel.
+ template <typename KernelName, typename KernelType>
+ event single_task(const std::vector<event> &depEvents,
+ const KernelType &kernelFunc) {
+ static_assert(
+ detail::CheckFunctionSignature<std::remove_reference_t<KernelType>,
+ void()>::value,
+ "sycl::queue::single_task() requires a kernel instead of a command "
+ "group");
+
+ setKernelParameters(depEvents);
+ submitSingleTask<KernelName, KernelType>(kernelFunc);
+ return getLastEvent();
+ }
+
private:
+ // Name of this function is defined by compiler. It generates call to this
+ // function in the host implementation of KernelFunc in submitSingleTask.
+ template <typename KN, typename... Args>
+ void sycl_kernel_launch(const char *KernelName, Args &&...args) {
+ static_assert(
+ sizeof...(args) == 1,
+ "sycl_kernel_launch expects only 2 arguments now: name of kernel and "
+ "callable object passed to kernel invocation by the user.");
+
+ auto FirstArg = std::get<0>(std::tie(args...));
+ submitKernelImpl(detail::getDeviceKernelInfo<KN>(KernelName), &FirstArg,
+ sizeof(FirstArg));
+ }
+
+#ifdef SYCL_LANGUAGE_VERSION
+# define _LIBSYCL_ENTRY_POINT_ATTR__(KernelName) \
+ [[clang::sycl_kernel_entry_point(KernelName)]]
+#else
+# define _LIBSYCL_ENTRY_POINT_ATTR__(KernelName)
+#endif // SYCL_LANGUAGE_VERSION
+
+ template <typename KernelName, typename KernelType>
+ _LIBSYCL_ENTRY_POINT_ATTR__(KernelName)
+ void submitSingleTask(const KernelType &KernelFunc) {
+ KernelFunc();
+ }
+
+ event getLastEvent();
+ void submitKernelImpl(detail::DeviceKernelInfo &KernelInfo, void *ArgData,
+ size_t ArgSize);
+ void setKernelParameters(const std::vector<event> &Events,
+ const detail::UnifiedRangeView &Range = {});
+
queue(const std::shared_ptr<detail::QueueImpl> &Impl) : impl(Impl) {}
std::shared_ptr<detail::QueueImpl> impl;
diff --git a/libsycl/src/detail/global_objects.cpp b/libsycl/src/detail/global_objects.cpp
index 35e32985e7cbb..fd94d772337d6 100644
--- a/libsycl/src/detail/global_objects.cpp
+++ b/libsycl/src/detail/global_objects.cpp
@@ -8,6 +8,7 @@
#include <detail/global_objects.hpp>
#include <detail/platform_impl.hpp>
+#include <detail/program_manager.hpp>
#ifdef _WIN32
# include <windows.h>
@@ -31,6 +32,7 @@ struct StaticVarShutdownHandler {
StaticVarShutdownHandler &
operator=(const StaticVarShutdownHandler &) = delete;
~StaticVarShutdownHandler() {
+ ProgramAndKernelManager::getInstance().releaseResources();
// No error reporting in shutdown
std::ignore = olShutDown();
}
diff --git a/libsycl/src/detail/program_manager.cpp b/libsycl/src/detail/program_manager.cpp
index 90d7c48d3d1c7..870d55198181b 100644
--- a/libsycl/src/detail/program_manager.cpp
+++ b/libsycl/src/detail/program_manager.cpp
@@ -18,6 +18,23 @@
_LIBSYCL_BEGIN_NAMESPACE_SYCL
namespace detail {
+DeviceKernelInfo &_LIBSYCL_EXPORT
+getDeviceKernelInfo(std::string_view KernelName) {
+ return ProgramAndKernelManager::getInstance().getDeviceKernelInfo(KernelName);
+}
+
+DeviceKernelInfo &
+ProgramAndKernelManager::getDeviceKernelInfo(std::string_view KernelName) {
+ auto It = MDeviceKernelInfoMap.find(KernelName);
+ assert(It != MDeviceKernelInfoMap.end());
+ return It->second;
+}
+
+void ProgramAndKernelManager::releaseResources() {
+ MDeviceKernelInfoMap.clear();
+ MDeviceImageManagers.clear();
+}
+
static inline bool checkFatBinVersion(const __sycl_tgt_bin_desc &FatbinDesc) {
return FatbinDesc.Version == SupportedOffloadBinaryVersion;
}
diff --git a/libsycl/src/detail/program_manager.hpp b/libsycl/src/detail/program_manager.hpp
index f5d3a1c8dc6dd..da56f6ec9706d 100644
--- a/libsycl/src/detail/program_manager.hpp
+++ b/libsycl/src/detail/program_manager.hpp
@@ -81,6 +81,12 @@ class ProgramAndKernelManager {
ol_symbol_handle_t getOrCreateKernel(DeviceKernelInfo &KernelInfo,
DeviceImpl &Device);
+ /// \return kernel info for the kernel with the specified name.
+ DeviceKernelInfo &getDeviceKernelInfo(std::string_view KernelName);
+
+ /// Release device image managers and corresponding resources.
+ void releaseResources();
+
private:
ProgramAndKernelManager() = default;
~ProgramAndKernelManager() = default;
diff --git a/libsycl/src/detail/queue_impl.cpp b/libsycl/src/detail/queue_impl.cpp
index 1d378f0ab5ef9..ff194f4d4152b 100644
--- a/libsycl/src/detail/queue_impl.cpp
+++ b/libsycl/src/detail/queue_impl.cpp
@@ -10,11 +10,41 @@
#include <detail/device_impl.hpp>
#include <detail/event_impl.hpp>
+#include <detail/program_manager.hpp>
+
+#include <algorithm>
_LIBSYCL_BEGIN_NAMESPACE_SYCL
namespace detail {
+static void setKernelLaunchArgs(const detail::UnifiedRangeView &Range,
+ ol_kernel_launch_size_args_t &ArgsToSet) {
+ assert(Range.MDims < 4 && "Invalid dimensions.");
+ uint32_t GlobalSize[3] = {1, 1, 1};
+ if (Range.MGlobalSize) {
+ for (auto I = 0; I < Range.MDims; I++) {
+ GlobalSize[I] = static_cast<uint32_t>(Range.MGlobalSize[I]);
+ }
+ }
+
+ uint32_t GroupSize[3] = {1, 1, 1};
+ if (Range.MLocalSize) {
+ for (auto I = 0; I < Range.MDims; I++) {
+ GroupSize[I] = static_cast<uint32_t>(Range.MLocalSize[I]);
+ }
+ }
+
+ ArgsToSet.Dimensions = Range.MDims;
+ ArgsToSet.NumGroups.x = GlobalSize[0] / GroupSize[0];
+ ArgsToSet.NumGroups.y = GlobalSize[1] / GroupSize[1];
+ ArgsToSet.NumGroups.z = GlobalSize[2] / GroupSize[2];
+ ArgsToSet.GroupSize.x = GroupSize[0];
+ ArgsToSet.GroupSize.y = GroupSize[1];
+ ArgsToSet.GroupSize.z = GroupSize[2];
+ ArgsToSet.DynSharedMemory = 0;
+}
+
QueueImpl::QueueImpl(DeviceImpl &deviceImpl, const async_handler &asyncHandler,
const property_list &propList, PrivateTag)
: MIsInorder(false), MAsyncHandler(asyncHandler), MPropList(propList),
@@ -33,5 +63,86 @@ backend QueueImpl::getBackend() const noexcept { return MDevice.getBackend(); }
void QueueImpl::wait() { callAndThrow(olSyncQueue, MOffloadQueue); }
+static bool checkEventsPlatformMatch(std::vector<EventImplPtr> &Events,
+ const PlatformImpl &QueuePlatform) {
+ // liboffload limitation to olWaitEvents. We can't do any extra handling for
+ // cross context/platform events without host task support now.
+ // "The input events can be from any queue on any device provided by the
+ // same platform as `Queue`."
+ return std::all_of(Events.cbegin(), Events.cend(),
+ [&QueuePlatform](const EventImplPtr &Event) {
+ return &Event->getPlatformImpl() == &QueuePlatform;
+ });
+}
+
+void QueueImpl::setKernelParameters(std::vector<EventImplPtr> &&Events,
+ const detail::UnifiedRangeView &Range) {
+ if (!checkEventsPlatformMatch(Events, MDevice.getPlatformImpl()))
+ throw sycl::exception(
+ sycl::make_error_code(sycl::errc::feature_not_supported),
+ "libsycl doesn't support cross-context/platform event dependencies "
+ "now.");
+
+ // TODO: this conversion and storing of only offload events is possible only
+ // while we don't have host tasks (or features based on host tasks, like
+ // streams). With them - it is very likely we should copy EventImplPtr
+ // (shared_ptr) and keep it here. Although it may differ if host tasks will be
+ // implemented on offload level (no data now).
+ assert(MCurrentSubmitInfo.DepEvents.empty() &&
+ "Kernel submission must clean up dependencies.");
+ MCurrentSubmitInfo.DepEvents.reserve(Events.size());
+ for (auto &Event : Events) {
+ assert(Event && "Event impl object can't be nullptr");
+ MCurrentSubmitInfo.DepEvents.push_back(Event->getHandle());
+ }
+ setKernelLaunchArgs(Range, MCurrentSubmitInfo.Range);
+}
+
+void QueueImpl::submitKernelImpl(DeviceKernelInfo &KernelInfo, void *ArgData,
+ size_t ArgSize) {
+ ol_symbol_handle_t Kernel =
+ detail::ProgramAndKernelManager::getInstance().getOrCreateKernel(
+ KernelInfo, MDevice);
+ assert(Kernel);
+
+ // TODO: liboffload supports only in-order queues and no cross context waiting
+ // is available now that means that this code is excessive but correct. I
+ // don't want to skip it and rely on default liboffload behaviour that is
+ // applicable for in-order queue only. Once OOO queues are added this waiting
+ // must be disabled for in-order queues. Once host tasks are added - cross
+ // context dependencies should be enabled and checked as well.
+ if (!MCurrentSubmitInfo.DepEvents.empty()) {
+ callAndThrow(olWaitEvents, MOffloadQueue,
+ MCurrentSubmitInfo.DepEvents.data(),
+ MCurrentSubmitInfo.DepEvents.size());
+ }
+
+ assert(ArgData && "At least one argument must exist");
+ assert(ArgSize && "Arguments size must be greater than 0");
+
+ // ol_kernel_launch_prop_t Props[2];
+ // Props[0].type = OL_KERNEL_LAUNCH_PROP_TYPE_SIZE;
+ // Props[0].data = &ArgSize;
+ // Props[1] = OL_KERNEL_LAUNCH_PROP_END;
+ auto Result =
+ olLaunchKernel(MOffloadQueue, MDevice.getOLHandle(), Kernel, &ArgData,
+ ArgSize, &MCurrentSubmitInfo.Range /*, Props*/);
+ // Clean up current kernel submit data to prepare structures for next
+ // submission.
+ MCurrentSubmitInfo.DepEvents.clear();
+ MCurrentSubmitInfo.Range = {};
+ if (isFailed(Result))
+ throw sycl::exception(sycl::make_error_code(sycl::errc::runtime),
+ std::string("Kernel submission (") +
+ KernelInfo.getName().data() + ") failed with " +
+ formatCodeString(Result));
+
+ ol_event_handle_t NewEvent{};
+ callAndThrow(olCreateEvent, MOffloadQueue, &NewEvent);
+
+ MCurrentSubmitInfo.LastEvent =
+ EventImpl::createEventWithHandle(NewEvent, MDevice.getPlatformImpl());
+}
+
} // namespace detail
_LIBSYCL_END_NAMESPACE_SYCL
diff --git a/libsycl/src/detail/queue_impl.hpp b/libsycl/src/detail/queue_impl.hpp
index 047cb121150f3..cda6ae2961c19 100644
--- a/libsycl/src/detail/queue_impl.hpp
+++ b/libsycl/src/detail/queue_impl.hpp
@@ -29,6 +29,8 @@ class ContextImpl;
class DeviceImpl;
class EventImpl;
+using EventImplPtr = std::shared_ptr<EventImpl>;
+
class QueueImpl : public std::enable_shared_from_this<QueueImpl> {
struct PrivateTag {
explicit PrivateTag() = default;
@@ -69,13 +71,50 @@ class QueueImpl : public std::enable_shared_from_this<QueueImpl> {
/// Waits for completion of all commands submitted to this queue.
void wait();
+ /// Enqueues a kernel to liboffload.
+ /// Kernel parameters like dependencies and range must be passed in advance by
+ /// calling setKernelParameters.
+ /// \param KernelInfo a kernel info that is uniform between different
+ /// submissions of the same kernel.
+ /// \param TypelessArgs data about kernel arguments to be used for enqueue.
+ void submitKernelImpl(DeviceKernelInfo &KernelInfo, void *ArgData,
+ size_t ArgSize);
+
+ /// \return an event impl object that corresponds to the last kernel
+ /// submission in the calling thread.
+ EventImplPtr getLastEvent() {
+ assert(MCurrentSubmitInfo.LastEvent &&
+ "getLastEvent must be called after enqueue");
+ return MCurrentSubmitInfo.LastEvent;
+ }
+
+ /// Sets kernel parameters to be used in the next submitKernelImpl call.
+ /// Must be called prior to a submitKernelImpl call.
+ /// \param Events a collection of events that the kernal depends on.
+ /// \param Range a unified range view of the execution range.
+ void setKernelParameters(std::vector<EventImplPtr> &&Events,
+ const detail::UnifiedRangeView &Range);
+
private:
+ // Queue features.
ol_queue_handle_t MOffloadQueue = {};
const bool MIsInorder;
const async_handler MAsyncHandler;
const property_list MPropList;
DeviceImpl &MDevice;
ContextImpl &MContext;
+
+ // Submit data.
+ struct KernelSubmitInfo {
+ EventImplPtr LastEvent;
+ ol_kernel_launch_size_args_t Range;
+ // TODO: consider storing EventImplPtr here, it will work with plain handle
+ // only because submission is done within queue::submit call. Otherwise we
+ // need to ensure that event handle is still alive by keeping our own copy
+ // of EventImpl.
+ std::vector<ol_event_handle_t> DepEvents;
+ };
+ inline static thread_local KernelSubmitInfo MCurrentSubmitInfo = {};
};
} // namespace detail
diff --git a/libsycl/src/queue.cpp b/libsycl/src/queue.cpp
index 6584a6e080ec3..b57324219e46b 100644
--- a/libsycl/src/queue.cpp
+++ b/libsycl/src/queue.cpp
@@ -35,4 +35,23 @@ bool queue::is_in_order() const { return impl->isInOrder(); }
void queue::wait() { impl->wait(); }
+event queue::getLastEvent() {
+ return detail::createSyclObjFromImpl<event>(impl->getLastEvent());
+}
+
+void queue::setKernelParameters(const std::vector<event> &Events,
+ const detail::UnifiedRangeView &Range) {
+ std::vector<detail::EventImplPtr> DepEventImplRefs;
+ DepEventImplRefs.reserve(Events.size());
+ for (const auto &Event : Events) {
+ DepEventImplRefs.push_back(detail::getSyclObjImpl(Event));
+ }
+ return impl->setKernelParameters(std::move(DepEventImplRefs), Range);
+}
+
+void queue::submitKernelImpl(detail::DeviceKernelInfo &KernelInfo,
+ void *ArgData, size_t ArgSize) {
+ impl->submitKernelImpl(KernelInfo, ArgData, ArgSize);
+}
+
_LIBSYCL_END_NAMESPACE_SYCL
More information about the llvm-branch-commits
mailing list