note down bullet points for the content of chapter 3 (performance)

12 months ago · 3f5f5f267d
1 changed files with 33 additions and 3 deletions
--- a/thesis/content/30_performance.tex
+++ b/thesis/content/30_performance.tex
@ -25,31 +25,61 @@
 \section{Benchmarking Methodology}

 \begin{itemize}
-    \item
+    \item 
 \end{itemize}

 \section{Submission Method}

 \begin{itemize}
    \item submit cost analysis: best method and for a subset the point at which submit cost < time savings
+    \item display the full opt-submitmethod graph
+    \item maybe remeassure with higher amount of small copies? results look somewhat weird for 1k and 4k
+    \item display the stacked bar of submit and complete time for single@1k, single@4k, single@1mib for HW-path and SW-path
+    \item display the stacked bar of submit and complete time for batch50@1k, batch50@4k, batch50@1mib for HW-path and SW-path
+    \item show batch because we care about the minimum task set size for a single producer (multi submit would be used for different task sets)
+    \item conclude at which point using the DSA makes sense
 \end{itemize}

 \section{Multithreaded Submission}

 \begin{itemize}
    \item effect of mt-submit, low because \gls{dsa:swq} implicitly synchronized, bandwidth is shared
+    \item show results for all available core counts
+    \item only display the 1engine tests
+    \item show combined total throughput
+    \item conclude that due to the implicit synchronization the sync-cost also affects 1t and therefore it makes no difference, bandwidth is shared, no guarantees on fairness
+\end{itemize}
+
+\section{Multiple Engines in a Group}
+
+\begin{itemize}
+    \item assumed from arch spec that multiple engines lead to greater Performance
+    \item reason is that page faults and access latency will be overlapped with preparing the next operation
+    \item in the given scenario we observe the opposite, slight performance decrease
+    \item show multisubmit 50 for both 1e and 4e
+    \item maybe remeassure with each submission accessing different memory region?
+    \item conclusion?
 \end{itemize}

 \section{Data Movement from DDR to HBM}

 \begin{itemize}
-    \item 
+    \item present two copy methods: smart and brute force
+    \item show graph for ddr->hbm intranode, ddr->hbm intrasocket, ddr->hbm intersocket
+    \item conclude which option makes more sense (smart)
+    \item because 4x or 2x utilization for only 1.5x or 1.25x speedup respectively
+    \item maybe benchmark smart-copy intersocket in parallel with two smart-copies intrasocket VS. the same task with brute force
 \end{itemize}

 \section{Analysis}

 \begin{itemize}
-    \item 
+    \item summarize the conclusions and define the point at which dsa makes sense
+    \item minimum transfer size for batch/nonbatch operation
+    \item effect of mtsubmit -> no fairness guarantees
+    \item usage of multiple engines -> no effect
+    \item smart copy method as the middle-ground between peak throughput and utilization
+    \item lower utilization of dsa is good when it will be shared between threads/processes
 \end{itemize}