\chapter{Evaluation} \label{chap:evaluation} % Zu jeder Arbeit in unserem Bereich gehört eine Leistungsbewertung. Aus % diesem Kapitel sollte hervorgehen, welche Methoden angewandt worden, % die Leistungsfähigkeit zu bewerten und welche Ergebnisse dabei erzielt % wurden. Wichtig ist es, dem Leser nicht nur ein paar Zahlen % hinzustellen, sondern auch eine Diskussion der Ergebnisse % vorzunehmen. Es wird empfohlen zunächst die eigenen Erwartungen % bezüglich der Ergebnisse zu erläutern und anschließend eventuell % festgestellte Abweichungen zu erklären. In this chapter we will define our expectations, applying the developed Cache to \glsentrylong{qdp}. To measure the performance, we adapted code developed by colleagues André Berthold and Anna Bartuschka for evaluating \gls{qdp} in \cite{dimes-prefetching}. \par \section{Expectations} \begin{figure}[h!tb] \centering \includegraphics[width=0.7\textwidth]{images/simple-query-graphic.pdf} \caption{Illustration of the benchmarked simple query in (a) and the corresponding pipeline in (b). Taken from \cite[Fig. 1]{dimes-prefetching}.} \label{fig:eval-simple-query} \end{figure} The benchmark executes a simple query as illustrated in Figure \ref{fig:eval-simple-query} which presents a challenging scenario to the cache. As the filter operation applied to \texttt{a} is not particularly complex, its execution time can be assumed to be short. Therefore, the Cache has little time during which it must prefetch, which will amplify delays caused by processing overhead in the Cache itself or from submission to the Work Queue. This makes the chosen query suited to stress test the developed solution. \par With this difficult scenario, we expect to spend time analysing runtime behaviour of our benchmark in order to optimize the Cache and the way it is applied to the query. Optimizations should yield slight performance improvement over the baseline, using DRAM, and will not reach the theoretical peak, where the data for \texttt{b} resides in HBM. \par Consider using parts of flamegraph. Same speed as dram, even though allocation is performed in the timed region and not before. Mention dml performs busy waiting (cite dsa-paper 4.4 for use of interrupts mentioned in arch), optimization with weak wait. Mention optimization weak access for prefetching scenario. Scan A is memory bound, therefore copying B from DRAM to HBM directly cuts into available bandwidth to A when both are located on the same node. Better performance when A and B are stored on different nodes. \section{Observation and Discussion} %%% Local Variables: %%% TeX-master: "diplom" %%% End: