diff --git a/thesis/content/20_state.tex b/thesis/content/20_state.tex index 5b15ee0..335e781 100644 --- a/thesis/content/20_state.tex +++ b/thesis/content/20_state.tex @@ -112,11 +112,11 @@ The \texttt{path} parameter allows the selection of the executing device, which Choosing the engine which carries out the copy might be advantageous for performance, as we can see in Subsection \ref{subsec:perf:datacopy}. With the engine directly tied to the processing node, as observed in Subsection \ref{subsection:dsa-hwarch}, the node ID is equivalent to the ID of the \gls{dsa}. \par -\gls{intel:dml} operates on data views, which we create from the given pointers to source and destination and size. This is done using \texttt{dml::make_view(uint8_t* ptr, size_t size)}, visible in Figure \ref{fig:dml-memcpy}, where these views are labelled \texttt{src_view} and \texttt{dst_view}. \cite[Sec. High-level C++ API, Make view]{intel:dmldoc} \par +\gls{intel:dml} operates on data views, which we create from the given pointers to source and destination and size. This is done using \texttt{dml::make\_view(uint8\_t* ptr, size\_t size)}, visible in Figure \ref{fig:dml-memcpy}, where these views are labelled \texttt{src\_view} and \texttt{dst\_view}. \cite[Sec. High-level C++ API, Make view]{intel:dmldoc} \par In Figure \ref{fig:dml-memcpy}, we submit a single descriptor using the asynchronous operation from \gls{intel:dml}. This uses the function \texttt{dml::submit}, which takes an operation type and parameters specific to the selected type and returns a handler to the submitted task. For the copy operation, we pass the two views created previously. The provided handler can later be queried for the completion of the operation. After submission, we poll for the task completion with \texttt{handler.get()} and check whether the operation completed successfully. \par -A noteworthy addition to the submission-call is the use of \texttt{.block_on_fault()}, enabling the \gls{dsa} to manage a page fault by coordinating with the operating system. It's essential to highlight that this functionality only operates if the device is configured to accept this flag. \cite[Sec. High-level C++ API, How to Use the Library]{intel:dmldoc} \cite[Sec. High-level C++ API, Page Fault handling]{intel:dmldoc} +A noteworthy addition to the submission-call is the use of \texttt{.block\_on\_fault()}, enabling the \gls{dsa} to manage a page fault by coordinating with the operating system. It's essential to highlight that this functionality only operates if the device is configured to accept this flag. \cite[Sec. High-level C++ API, How to Use the Library]{intel:dmldoc} \cite[Sec. High-level C++ API, Page Fault handling]{intel:dmldoc} \section{System Setup and Configuration} \label{sec:state:setup-and-config} diff --git a/thesis/content/50_implementation.tex b/thesis/content/50_implementation.tex index f625a4c..437cf01 100644 --- a/thesis/content/50_implementation.tex +++ b/thesis/content/50_implementation.tex @@ -36,11 +36,11 @@ Even with this optimization, in scenarios where the \texttt{Cache} is frequently \subsection{CacheData Atomicity} -The choice made in \ref{subsec:design:cache-entry-reuse} necessitates thread-safe shared access to the same resource. The C++ standard library provides \texttt{std::shared_ptr}, a reference-counted pointer that is thread-safe for the required operations \cite{cppreference:shared-ptr}, making it a suitable candidate for this task. Although an implementation using it was explored, it presented its own set of challenges. \par +The choice made in \ref{subsec:design:cache-entry-reuse} necessitates thread-safe shared access to the same resource. The C++ standard library provides \texttt{std::shared\_ptr}, a reference-counted pointer that is thread-safe for the required operations \cite{cppreference:shared-ptr}, making it a suitable candidate for this task. Although an implementation using it was explored, it presented its own set of challenges. \par As we aim to minimize the time spent in a locked region, only the task is added to the \gls{numa:node}'s cache state when locked, with the submission taking place outside the locked region. We assume the handlers of \gls{intel:dml} to be unsafe for access from multiple threads. To achieve the safety for \texttt{CacheData::WaitOnCompletion} outlined in \ref{subsec:design:cache-entry-reuse}, threads need to coordinate which one performs the actual waiting. To avoid queuing multiple copies of the same task, the task must be added before submission. This results in a \texttt{CacheData} instance with an invalid cache pointer and no handlers to wait for, presenting an edge case to be considered. \par -Using \texttt{std::shared_ptr} also introduces uncertainty, relying on the implementation to be performant. The standard does not specify whether a lock-free algorithm is to be used, and \cite{shared-ptr-perf} suggests abysmal performance for some implementations, although the full article is in Korean. No further research was found on this topic. \par +Using \texttt{std::shared\_ptr} also introduces uncertainty, relying on the implementation to be performant. The standard does not specify whether a lock-free algorithm is to be used, and \cite{shared-ptr-perf} suggests abysmal performance for some implementations, although the full article is in Korean. No further research was found on this topic. \par Therefore, the decision was made to implement atomic reference counting for \texttt{CacheData}. This involves providing a custom constructor and destructor wherein a shared (though a standard pointer) atomic integer is either incremented or decremented using atomic fetch sub and add operations \cite{cppreference:atomic-operations} to modify the reference count. In the case of a decrease to zero, the destructor was called for the last reference and then performs the actual destruction. \par