Browse Source

address calculations the way that the align-environment numbers them (leading with chapter number)

master
Constantin Fürst 3 months ago
parent
commit
142f9391bf
  1. 2
      thesis/content/30_performance.tex

2
thesis/content/30_performance.tex

@ -85,7 +85,7 @@ In Figure \ref{fig:perf-mtsubmit}, we note that threading has no discernible neg
8800\ MT/s \times 8B/T = 70400 \times 10^6 B/s &= 65.56\ GiB/s
\end{align}
Moving data from \glsentryshort{dram} to \gls{hbm} is most relevant to the rest of this work, as it is the target application. With \gls{hbm} offering higher bandwidth than the \glsentryshort{dram} of our system, we will be restricted by the available bandwidth of the source. To determine the upper limit achievable, we must calculate the available peak bandwidth. For each \gls{numa:node}, the test system is configured with two DIMMs of DDR5-4800. The naming scheme contains the data rate in Megatransfers (MT) per second, however the processor specification notes that for dual channel operation, the maximum supported speed drops to \(4400\ MT/s\) \cite{intel:xeonmax-ark}. We calculate the transfers performed per second for one \gls{numa:node} (1), followed by the bytes per transfer \cite{kingston:ddr5-spec-overview} in calculation (2), and at last combine these two for the theoretical peak bandwidth per \gls{numa:node} on the system (3). \par
Moving data from \glsentryshort{dram} to \gls{hbm} is most relevant to the rest of this work, as it is the target application. With \gls{hbm} offering higher bandwidth than the \glsentryshort{dram} of our system, we will be restricted by the available bandwidth of the source. To determine the upper limit achievable, we must calculate the available peak bandwidth. For each \gls{numa:node}, the test system is configured with two DIMMs of DDR5-4800. The naming scheme contains the data rate in Megatransfers (MT) per second, however the processor specification notes that for dual channel operation, the maximum supported speed drops to \(4400\ MT/s\) \cite{intel:xeonmax-ark}. We calculate the transfers performed per second for one \gls{numa:node} (3.1), followed by the bytes per transfer \cite{kingston:ddr5-spec-overview} in calculation (3.2), and at last combine these two for the theoretical peak bandwidth per \gls{numa:node} on the system (3.3). \par
\begin{figure}[t]
\centering

Loading…
Cancel
Save