From 85ccb6fea1a8a2a7ee5aa57b99887a46c766da18 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Constantin=20F=C3=BCrst?= <c@fuersten.info>
Date: Thu, 15 Feb 2024 22:11:50 +0100
Subject: [PATCH] add usage of dwq as advantage of switching to direct
 programming

---
 thesis/content/70_conclusion.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/thesis/content/70_conclusion.tex b/thesis/content/70_conclusion.tex
index 4166180..4fcd204 100644
--- a/thesis/content/70_conclusion.tex
+++ b/thesis/content/70_conclusion.tex
@@ -22,7 +22,7 @@ Upon applying the cache developed in Chapters \ref{chap:design} and \ref{chap:im
 
 In Section \ref{sec:eval:observations}, we observed adverse effects when prefetching with the cache during the parallel execution of memory-bound operations. This necessitated data distribution across multiple \glsentrylong{numa:node}s to circumvent bandwidth competition caused by parallel caching operations. Despite this limitation, we do not consider it a major fault of the \texttt{Cache}, as existing applications designed for \gls{numa} systems are likely already optimized in this regard. \par
 
-As highlighted in Sections \ref{sec:state:dml} and \ref{sec:impl:application}, the \gls{api} utilized to interact with the \gls{dsa} currently lacks support for interrupt-based completion waiting and the use of \glsentrylong{dsa:dwq}. Future development efforts may focus on direct \gls{dsa} access, bypassing the \glsentrylong{intel:dml}, to leverage the complete feature set. Particularly, interrupt-based waiting would significantly enhance the usability of the \texttt{Cache}, which currently only supports busy-waiting\todo{mention that busy waiting goes against the rationale of offloading data copy, only usefull to reduce power consumption for sync-copy, cite dsaanalysis for this}. \todo{we could also mention dwq to reduce overhead of cache and therefore time spent in scanb} \par
+As highlighted in Sections \ref{sec:state:dml} and \ref{sec:impl:application}, the \gls{api} utilized to interact with the \gls{dsa} currently lacks support for interrupt-based completion waiting and the use of \glsentrylong{dsa:dwq}. Future development efforts may focus on direct \gls{dsa} access, bypassing the \glsentrylong{intel:dml}, to leverage the complete feature set. Particularly, interrupt-based waiting would significantly enhance the usability of the \texttt{Cache}, currently only supporting busy-waiting. This lead us to extend the design by implement weak-waiting in Section \ref{sec:impl:application}, favouring cache misses instead of wasting resources during the wait. Additionally, access through a \glsentrylong{dsa:dwq} has the potential to reduce submission cost and thereby increase the caches' effectiveness. \par
 
 Although the preceding paragraphs and the results in Chapter \ref{chap:evaluation} might suggest that the \texttt{Cache} requires extensive refinement for production applications, we argue the opposite. Under favourable conditions, as assumed for \glsentryshort{numa}-aware applications, we observed significant speed-up using the \texttt{Cache} for prefetching to \glsentrylong{hbm}, accelerating database queries. Its utility is not limited to prefetching alone; it offers a solution for replicating data to \gls{nvram} and might prove applicable to different use cases. Additional benchmarks on more complex queries for \gls{qdp} and a comparison between prefetching to \gls{hbm} using knowledge of the coming queries and the data they access, and \enquote{HBM Cache Mode} (see Section \ref{sec:state:hbm}) could yield deeper insights into the caches' performance. \par