@ -57,6 +57,15 @@ When multiple consumers wish to access the same memory block through the \texttt
Allowing multiple references to the same entry introduces concerns regarding memory management. The allocated block should only be freed when all copies of a \texttt{CacheData} instance are destroyed, thereby tying the cache entry's lifetime to the longest living copy of the original instance. This ensures that access to the entry is legal during the lifetime of any \texttt{CacheData} instance. Therefore, deallocation only occurs when the last copy of a \texttt{CacheData} instance is destroyed. \par
\subsection{Weak Behaviour and Page Fault Handling}
\label{subsec:design:weakop-and-pf}
During our testing phase, we discovered that \gls{intel:dml} does not support interrupt-based completion signaling, as discussed in Section \ref{subsubsec:state:completion-signal}. Instead, it resorts to busy-waiting, which consumes CPU cycles and is primarily beneficial for reducing power consumption during copies \cite{intel:analysis}. To mitigate this issue, we extended the functionality of both \texttt{Cache} and \texttt{CacheData}, providing weak versions of \texttt{Cache::Access} and \texttt{CacheData::WaitOnCompletion}. The weak wait function only checks for operation completion once and then returns, relaxing the guarantee that the cache location will be valid after the call. Hence, the user must verify validity even after the wait function if they choose to utilize this option. Similarly, weak access merely returns a pre-existing instance of \texttt{CacheData}, bypassing caching. This feature proves beneficial in latency-sensitive scenarios where the overhead from cache operations and waiting for operation completion is undesirable. \par
Additionally, while optimizing for access latency, we encountered delays caused by page fault handling on the \gls{dsa}, as depicted in Figure \ref{fig:dml-memcpy}. These delays not only affect the current task but also impede the progress of other tasks on the \gls{dsa}. Consequently, the default behaviour of the cache is set to trigger an error on page faults, while still offering the option to let the \gls{dsa} handle the fault. \par
To configure runtime behaviour, we introduced a flag system to both \texttt{Cache} and \texttt{CacheData}, with the latter inheriting any flags set in the former upon creation. This design allows for global settings, such as opting for weak waits or enabling page fault handling. Weak waits can also be selected in specific situations by setting the flag on the \texttt{CacheData} instance. For \texttt{Cache::Access}, the flag must be set for each function call, defaulting to strong access, as exclusively using weak access would result in no cache utilization. \par