\caption{Illustration of the benchmarked simple query in (a) and the corresponding pipeline in (b). Taken from \cite[Fig. 1]{dimes-prefetching}.}
@ -25,6 +25,8 @@ The benchmark executes a simple query as illustrated in Figure \ref{fig:eval-sim
With this difficult scenario, we expect to spend time analysing runtime behaviour of our benchmark in order to optimize the Cache and the way it is applied to the query. Optimizations should yield slight performance improvement over the baseline, using DRAM, and will not reach the theoretical peak, where the data for \texttt{b} resides in HBM. \par
Consider using parts of flamegraph. Same speed as dram, even though allocation is performed in the timed region and not before. Mention dml performs busy waiting (cite dsa-paper 4.4 for use of interrupts mentioned in arch), optimization with weak wait. Mention optimization weak access for prefetching scenario.