diff --git a/thesis/bachelor.pdf b/thesis/bachelor.pdf index 54e0b03..b75fc71 100644 Binary files a/thesis/bachelor.pdf and b/thesis/bachelor.pdf differ diff --git a/thesis/content/60_evaluation.tex b/thesis/content/60_evaluation.tex index 6f71751..68fe388 100644 --- a/thesis/content/60_evaluation.tex +++ b/thesis/content/60_evaluation.tex @@ -22,7 +22,7 @@ The pipelines \(SCAN_a\) and \(SCAN_b\) execute concurrently, completing their t \section{Expectations} \label{sec:eval:expectations} -The simple query presents a challenging scenario for the \texttt{Cache}. The execution time for the filter operation applied to column \texttt{a} is expected to be brief. Consequently, the \texttt{Cache} has limited time for prefetching, which may exacerbate delays caused by processing overhead in the \texttt{Cache} or during accelerator offload. Furthermore, it can be assumed that the \(SCAN_a\) is memory-bound by itself. Since the prefetching of \texttt{b} in \(SCAN_b\) and the loading and subsequent filtering of \texttt{a} occur concurrently, caching directly diminishes the memory bandwidth available to \(SCAN_a\) when both columns are located on the same \gls{numa:node}. +The simple query presents a challenging scenario for the \texttt{Cache}. The execution time for the filter operation applied to column \texttt{a} is expected to be brief. Consequently, the \texttt{Cache} has limited time for prefetching, which may exacerbate delays caused by processing overhead in the \texttt{Cache} or during accelerator offload. Furthermore, it can be assumed that the \(SCAN_a\) is memory-bound by itself. Since the prefetching of \texttt{b} in \(SCAN_b\) and the loading and subsequent filtering of \texttt{a} occur concurrently, caching directly diminishes the memory bandwidth available to \(SCAN_a\) when both columns are located on the same \gls{numa:node}. \par \section{Observations} \label{sec:eval:observations} @@ -103,13 +103,12 @@ Regarding the benchmark depicted in Figure \ref{fig:timing-results:prefetch}, wh \section{Discussion} -\begin{itemize} - \item Is the test a good choice? (yes, shows worst case with memory bound application and increases effects of overhead) - \item Are there other use cases for the cache? (yes, applications that are CPU bound -> DSA frees up CPU cycles) - \item Can we assume that it is possible to distribute columns over multiple nodes? -\end{itemize} +In Section \ref{sec:eval:expectations}, we anticipated that the simple query would pose a challenging case for prefetching. This expectation proved to be accurate, highlighting that improper data distribution can lead to adverse effects on performance when utilizing the \texttt{Cache}. Thus, we consider the chosen scenario to be well-suited, as it showcases both performance gains and losses, underscoring the importance of optimizing parameters and scenarios to achieve positive outcomes. + +The necessity to distribute data across \gls{numa:node}s is seen as practical, given that developers commonly apply this optimization to leverage the available memory bandwidth of \glsentrylong{numa}s. Consequently, the \texttt{Cache} has demonstrated its effectiveness by achieving a respectable speed-up positioned directly between the baseline and the theoretical upper limit (refer to Table \ref{table:qdp-speedup}). + +As stated in Section \ref{sec:design:cache}, the decision to design and implement a cache instead of focusing solely on prefetching was made to enhance the usefulness of this work's contribution. While our tests were conducted on a system with \gls{hbm}, other advancements in main memory technologies, such as Non-Volatile or Remote Memory, were not considered, as mentioned in Chapter \ref{chap:intro}. Despite the public functions of the \texttt{Cache} being named with cache usage in mind, its utility extends beyond this scope, providing flexibility through the policy functions, described in Section \ref{sec:design:accel-usage}. Potential applications include background copying of data from remote locations into faster local memory for computation or replication to non-volatile memory for data loss prevention. Therefore, we consider the increase in design complexity to be a worthwhile trade-off, providing a significant contribution to the field of heterogeneous memory systems. \par -\todo{write this section} %%% Local Variables: %%% TeX-master: "diplom"