From 43254d0f3ca156c0393a2034606bc2745a14ac4d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Constantin=20F=C3=BCrst?= <c@fuersten.info>
Date: Tue, 6 Feb 2024 21:17:03 +0100
Subject: [PATCH] note that the cost observed for swq submission is lower than
 what reese kuper saw

---
 thesis/content/30_performance.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/thesis/content/30_performance.tex b/thesis/content/30_performance.tex
index 706a317..8679ba9 100644
--- a/thesis/content/30_performance.tex
+++ b/thesis/content/30_performance.tex
@@ -60,7 +60,7 @@ We anticipate that single submissions will consistently yield poorer performance
     \label{fig:perf-submitmethod}
 \end{figure}
 
-In Figure \ref{fig:perf-submitmethod} we conclude that with transfers of 1 MiB and upwards, the submission method makes no noticeable difference. For smaller transfers the performance varies greatly, with batch operations leading in throughput. This finding is aligned with the observation that \enquote{SWQ observes lower throughput between 1-8 KB [transfer size]} \cite[p. 6 and 7]{intel:analysis} for normal submission method. \par
+In Figure \ref{fig:perf-submitmethod} we conclude that with transfers of 1 MiB and upwards, the submission method makes no noticeable difference. For smaller transfers the performance varies greatly, with batch operations leading in throughput. Reese Kuper et al. observed that \enquote{SWQ observes lower throughput between 1-8 KB [transfer size]} \cite[pp. 6]{intel:analysis}. We however observe a much higher point of equalization, pointing to additional delays introduced by programming the \gls{dsa} through \gls{intel:dml}. \par
 
 Another limitation may be observed in this result, namely the inherent throughput limit per \gls{dsa} chip of close to 30 GiB/s. This is apparently caused by I/O fabric limitations \cite[p. 5]{intel:analysis}. \par