From d935f38010621f658a60e963099ee95a01315ec6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Constantin=20F=C3=BCrst?= <c@fuersten.info>
Date: Thu, 8 Feb 2024 18:39:35 +0100
Subject: [PATCH] add some additional notes for writing evaluation chapter

---
 thesis/content/60_evaluation.tex | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/thesis/content/60_evaluation.tex b/thesis/content/60_evaluation.tex
index 2967ca4..5802248 100644
--- a/thesis/content/60_evaluation.tex
+++ b/thesis/content/60_evaluation.tex
@@ -27,6 +27,8 @@ With this difficult scenario, we expect to spend time analysing runtime behaviou
 
 Consider using parts of flamegraph. Same speed as dram, even though allocation is performed in the timed region and not before. Mention dml performs busy waiting (cite dsa-paper 4.4 for use of interrupts mentioned in arch), optimization with weak wait. Mention optimization weak access for prefetching scenario.
 
+Scan A is memory bound, therefore copying B from DRAM to HBM directly cuts into available bandwidth to A when both are located on the same node. Better performance when A and B are stored on different nodes.
+
 \section{Observation and Discussion}
 
 %%% Local Variables: