Browse Source

finalize draft of chapter 6 eval

master
Constantin Fürst 10 months ago
parent
commit
b68ac2358b
  1. BIN
      thesis/bachelor.pdf
  2. 13
      thesis/content/60_evaluation.tex

BIN
thesis/bachelor.pdf

13
thesis/content/60_evaluation.tex

@ -22,7 +22,7 @@ The pipelines \(SCAN_a\) and \(SCAN_b\) execute concurrently, completing their t
\section{Expectations} \section{Expectations}
\label{sec:eval:expectations} \label{sec:eval:expectations}
The simple query presents a challenging scenario for the \texttt{Cache}. The execution time for the filter operation applied to column \texttt{a} is expected to be brief. Consequently, the \texttt{Cache} has limited time for prefetching, which may exacerbate delays caused by processing overhead in the \texttt{Cache} or during accelerator offload. Furthermore, it can be assumed that the \(SCAN_a\) is memory-bound by itself. Since the prefetching of \texttt{b} in \(SCAN_b\) and the loading and subsequent filtering of \texttt{a} occur concurrently, caching directly diminishes the memory bandwidth available to \(SCAN_a\) when both columns are located on the same \gls{numa:node}.
The simple query presents a challenging scenario for the \texttt{Cache}. The execution time for the filter operation applied to column \texttt{a} is expected to be brief. Consequently, the \texttt{Cache} has limited time for prefetching, which may exacerbate delays caused by processing overhead in the \texttt{Cache} or during accelerator offload. Furthermore, it can be assumed that the \(SCAN_a\) is memory-bound by itself. Since the prefetching of \texttt{b} in \(SCAN_b\) and the loading and subsequent filtering of \texttt{a} occur concurrently, caching directly diminishes the memory bandwidth available to \(SCAN_a\) when both columns are located on the same \gls{numa:node}. \par
\section{Observations} \section{Observations}
\label{sec:eval:observations} \label{sec:eval:observations}
@ -103,13 +103,12 @@ Regarding the benchmark depicted in Figure \ref{fig:timing-results:prefetch}, wh
\section{Discussion} \section{Discussion}
\begin{itemize}
\item Is the test a good choice? (yes, shows worst case with memory bound application and increases effects of overhead)
\item Are there other use cases for the cache? (yes, applications that are CPU bound -> DSA frees up CPU cycles)
\item Can we assume that it is possible to distribute columns over multiple nodes?
\end{itemize}
In Section \ref{sec:eval:expectations}, we anticipated that the simple query would pose a challenging case for prefetching. This expectation proved to be accurate, highlighting that improper data distribution can lead to adverse effects on performance when utilizing the \texttt{Cache}. Thus, we consider the chosen scenario to be well-suited, as it showcases both performance gains and losses, underscoring the importance of optimizing parameters and scenarios to achieve positive outcomes.
The necessity to distribute data across \gls{numa:node}s is seen as practical, given that developers commonly apply this optimization to leverage the available memory bandwidth of \glsentrylong{numa}s. Consequently, the \texttt{Cache} has demonstrated its effectiveness by achieving a respectable speed-up positioned directly between the baseline and the theoretical upper limit (refer to Table \ref{table:qdp-speedup}).
As stated in Section \ref{sec:design:cache}, the decision to design and implement a cache instead of focusing solely on prefetching was made to enhance the usefulness of this work's contribution. While our tests were conducted on a system with \gls{hbm}, other advancements in main memory technologies, such as Non-Volatile or Remote Memory, were not considered, as mentioned in Chapter \ref{chap:intro}. Despite the public functions of the \texttt{Cache} being named with cache usage in mind, its utility extends beyond this scope, providing flexibility through the policy functions, described in Section \ref{sec:design:accel-usage}. Potential applications include background copying of data from remote locations into faster local memory for computation or replication to non-volatile memory for data loss prevention. Therefore, we consider the increase in design complexity to be a worthwhile trade-off, providing a significant contribution to the field of heterogeneous memory systems. \par
\todo{write this section}
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"

Loading…
Cancel
Save