From 85ccb6fea1a8a2a7ee5aa57b99887a46c766da18 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Constantin=20F=C3=BCrst?= Date: Thu, 15 Feb 2024 22:11:50 +0100 Subject: [PATCH] add usage of dwq as advantage of switching to direct programming --- thesis/content/70_conclusion.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/thesis/content/70_conclusion.tex b/thesis/content/70_conclusion.tex index 4166180..4fcd204 100644 --- a/thesis/content/70_conclusion.tex +++ b/thesis/content/70_conclusion.tex @@ -22,7 +22,7 @@ Upon applying the cache developed in Chapters \ref{chap:design} and \ref{chap:im In Section \ref{sec:eval:observations}, we observed adverse effects when prefetching with the cache during the parallel execution of memory-bound operations. This necessitated data distribution across multiple \glsentrylong{numa:node}s to circumvent bandwidth competition caused by parallel caching operations. Despite this limitation, we do not consider it a major fault of the \texttt{Cache}, as existing applications designed for \gls{numa} systems are likely already optimized in this regard. \par -As highlighted in Sections \ref{sec:state:dml} and \ref{sec:impl:application}, the \gls{api} utilized to interact with the \gls{dsa} currently lacks support for interrupt-based completion waiting and the use of \glsentrylong{dsa:dwq}. Future development efforts may focus on direct \gls{dsa} access, bypassing the \glsentrylong{intel:dml}, to leverage the complete feature set. Particularly, interrupt-based waiting would significantly enhance the usability of the \texttt{Cache}, which currently only supports busy-waiting\todo{mention that busy waiting goes against the rationale of offloading data copy, only usefull to reduce power consumption for sync-copy, cite dsaanalysis for this}. \todo{we could also mention dwq to reduce overhead of cache and therefore time spent in scanb} \par +As highlighted in Sections \ref{sec:state:dml} and \ref{sec:impl:application}, the \gls{api} utilized to interact with the \gls{dsa} currently lacks support for interrupt-based completion waiting and the use of \glsentrylong{dsa:dwq}. Future development efforts may focus on direct \gls{dsa} access, bypassing the \glsentrylong{intel:dml}, to leverage the complete feature set. Particularly, interrupt-based waiting would significantly enhance the usability of the \texttt{Cache}, currently only supporting busy-waiting. This lead us to extend the design by implement weak-waiting in Section \ref{sec:impl:application}, favouring cache misses instead of wasting resources during the wait. Additionally, access through a \glsentrylong{dsa:dwq} has the potential to reduce submission cost and thereby increase the caches' effectiveness. \par Although the preceding paragraphs and the results in Chapter \ref{chap:evaluation} might suggest that the \texttt{Cache} requires extensive refinement for production applications, we argue the opposite. Under favourable conditions, as assumed for \glsentryshort{numa}-aware applications, we observed significant speed-up using the \texttt{Cache} for prefetching to \glsentrylong{hbm}, accelerating database queries. Its utility is not limited to prefetching alone; it offers a solution for replicating data to \gls{nvram} and might prove applicable to different use cases. Additional benchmarks on more complex queries for \gls{qdp} and a comparison between prefetching to \gls{hbm} using knowledge of the coming queries and the data they access, and \enquote{HBM Cache Mode} (see Section \ref{sec:state:hbm}) could yield deeper insights into the caches' performance. \par