diff --git a/thesis/content/50_implementation.tex b/thesis/content/50_implementation.tex index 3672d2b..0d73362 100644 --- a/thesis/content/50_implementation.tex +++ b/thesis/content/50_implementation.tex @@ -40,7 +40,21 @@ The choice made in \ref{subsec:design:cache-entry-reuse} requires thread safe sh It was therefore decided to implement atomic reference counting for \texttt{CacheData} which means providing a custom constructor and destructor wherein a shared (through a standard pointer however) atomic integer is either incremented or decremented using atomic fetch sub and add operations \cite{cppreference:atomic-operations} to increase or deacrease the reference counter and, in case of decrease in the destructor signals that the destructor is called for the last reference, perform actual destruction. The invalid state of \texttt{CacheData} achievable is also avoided. To achieve this, the waiting algorithm requires the handlers to be contained in an atomic pointer and the pointer to the cache memory be atomic too. Through this we may use the atomic wait operation which is guaranteed by the standard to be more efficient than simply spinning on Compare-And-Swap \cite{cppreference:atomic-wait}. Some standard implementations achieve this by yielding after a short spin cycle \cite{atomic-wait-details}. \par -Designing the wait to work from any thread was complicated. In the first implementation, a thread would check if the handlers are available and if not atomically wait \cite{cppreference:atomic-wait} on a value change from nullptr. As the handlers are only available after submission, a situation could arise where only one copy of \texttt{CacheData} is capable of actually waiting on them. Lets assume that three threads \(T_1\), \(T_2\) and \(T_3\) wish to access the same resource. \(T_1\) now is the first to call \texttt{CacheData::Access} and therefore adds it to the cache state and will perform the work submission. Before \(T_1\) may submit the work, it is interrupted and \(T_2\) and \(T_3\) obtain access to the incomplete \texttt{CacheData} on which they wait, causing them to see a nullptr for the handlers but invalid cache pointer, leading to atomic wait on the cache pointer. Now \(T_1\) submits the work and sets the handlers, while \(T_2\) and \(T_3\) continue to wait. Now only \(T_1\) can trigger the waiting and is therefore capable of keeping \(T_2\) and \(T_3\) from progressing. This is undesirable as it can lead to deadlocking if by some reason \(T_1\) does not wait and at least may lead to unnecessary delay for \(T_2\) and \(T_3\) if \(T_1\) does not wait immediately. \par +\begin{figure}[H] + \centering + \includegraphics[width=0.9\textwidth]{images/sequenzdiagramm-waitoncompletion.png} + \caption{Sequence diagram for threading scenario in \texttt{CacheData::WaitOnCompletion}} + \label{fig:impl-cachedata-threadseq-waitoncompletion} +\end{figure} + +Designing the wait to work from any thread was complicated. In the first implementation, a thread would check if the handlers are available and if not atomically wait \cite{cppreference:atomic-wait} on a value change from nullptr. As the handlers are only available after submission, a situation could arise where only one copy of \texttt{CacheData} is capable of actually waiting on them. Lets assume that three threads \(T_1\), \(T_2\) and \(T_3\) wish to access the same resource. \(T_1\) now is the first to call \texttt{CacheData::Access} and therefore adds it to the cache state and will perform the work submission. Before \(T_1\) may submit the work, it is interrupted and \(T_2\) and \(T_3\) obtain access to the incomplete \texttt{CacheData} on which they wait, causing them to see a nullptr for the handlers but invalid cache pointer, leading to atomic wait on the cache pointer (marked blue lines in \ref{fig:impl-cachedata-threadseq-waitoncompletion}). Now \(T_1\) submits the work and sets the handlers (marked red lines in \ref{fig:impl-cachedata-threadseq-waitoncompletion}), while \(T_2\) and \(T_3\) continue to wait. Now only \(T_1\) can trigger the waiting and is therefore capable of keeping \(T_2\) and \(T_3\) from progressing. This is undesirable as it can lead to deadlocking if by some reason \(T_1\) does not wait and at the very least may lead to unnecessary delay for \(T_2\) and \(T_3\) if \(T_1\) does not wait immediately. \par + +\begin{figure}[H] + \centering + \includegraphics[width=0.9\textwidth]{images/structo-cachedata-waitoncompletion.png} + \caption{Code Flow Diagram for \texttt{CacheData::WaitOnCompletion}} + \label{fig:impl-cachedata-waitoncompletion} +\end{figure} To solve this, a different and more complicated order of waiting operations is required. When waiting, the threads now immediately check whether the cache pointer contains a valid value and return if it does, as nothing has to be waited for in this case. Let's take the same example as before to illustrate the second part of the waiting procedure. \(T_2\) and \(T_3\) now both arrive in this latter section as the cache was invalid at the point in time when waiting was called for. They now atomically wait on the handlers pointer to change, instead of doing it the other way around as before. Now when \(T_1\) supplies the handlers, it also uses \texttt{std::atomic::notify\_one} \cite{cppreference:atomic-notify-one} to wake at least one thread waiting on value change of the handlers pointer, if there are any. Through this the exclusion that was observable in the first implementation is already avoided. If nobody is waiting, then the handlers will be set to a valid pointer and a thread may pass the atomic wait instruction later on. Following this wait, the handlers pointer is atomically exchanged \cite{cppreference:atomic-exchange} with nullptr, invalidating it. Now each thread again checks whether it has received a valid local pointer to the handlers from the exchange, if it has then the atomic operation guarantees that is now in sole possession of the pointer. The owning thread is tasked with actually waiting. All other threads will now regress and call \texttt{CacheData::WaitOnCompletion} again. The solo thread may proceed to wait on the handlers and should update the cache pointer. \par diff --git a/thesis/images/sequenzdiagramm-waitoncompletion.png b/thesis/images/sequenzdiagramm-waitoncompletion.png new file mode 100644 index 0000000..6523cc6 Binary files /dev/null and b/thesis/images/sequenzdiagramm-waitoncompletion.png differ diff --git a/thesis/images/sequenzdiagramm-waitoncompletion.xml b/thesis/images/sequenzdiagramm-waitoncompletion.xml new file mode 100644 index 0000000..64dc75e --- /dev/null +++ b/thesis/images/sequenzdiagramm-waitoncompletion.xml @@ -0,0 +1,145 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/thesis/images/structo-cachedata-waitoncompletion.nsd b/thesis/images/structo-cachedata-waitoncompletion.nsd new file mode 100644 index 0000000..e799a4b --- /dev/null +++ b/thesis/images/structo-cachedata-waitoncompletion.nsd @@ -0,0 +1,44 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/thesis/images/structo-cachedata-waitoncompletion.png b/thesis/images/structo-cachedata-waitoncompletion.png new file mode 100644 index 0000000..d0f3ac0 Binary files /dev/null and b/thesis/images/structo-cachedata-waitoncompletion.png differ