Browse Source

handle cleardoublepage in bachelor.tex and not the chapter files themselves

master
Constantin Fürst 11 months ago
parent
commit
ba9d18d895
  1. 11
      thesis/bachelor.tex
  2. 2
      thesis/content/00_title.tex
  3. 3
      thesis/content/10_introduction.tex
  4. 8
      thesis/content/30_performance.tex
  5. 2
      thesis/content/40_design.tex
  6. 2
      thesis/content/50_implementation.tex
  7. 2
      thesis/content/60_evaluation.tex
  8. 2
      thesis/content/70_conclusion.tex

11
thesis/bachelor.tex

@ -47,6 +47,7 @@ plainpages=false,pdfpagelabels=true]{hyperref}
\pagenumbering{Roman} \pagenumbering{Roman}
\input{content/00_title.tex} \input{content/00_title.tex}
\cleardoublepage
\includepdf{images/bachelor-aufgabe.pdf} \includepdf{images/bachelor-aufgabe.pdf}
\cleardoublepage \cleardoublepage
@ -60,6 +61,8 @@ plainpages=false,pdfpagelabels=true]{hyperref}
\input{content/02_abstract.tex} \input{content/02_abstract.tex}
\end{abstract} \end{abstract}
\setcounter{figure}{0}
\cleardoublepage \cleardoublepage
\tableofcontents \tableofcontents
@ -73,17 +76,23 @@ plainpages=false,pdfpagelabels=true]{hyperref}
\cleardoublepage \cleardoublepage
\pagenumbering{arabic} \pagenumbering{arabic}
\setcounter{figure}{0}
% use \input for small stuff (like a list you include twice or a tiks figure) % use \input for small stuff (like a list you include twice or a tiks figure)
% and \include for large latex compilation workloads (like a chapter) to get faster builds. % and \include for large latex compilation workloads (like a chapter) to get faster builds.
\include{content/10_introduction} \include{content/10_introduction}
\cleardoublepage
\include{content/20_state} \include{content/20_state}
\cleardoublepage
\include{content/30_performance} \include{content/30_performance}
\cleardoublepage
\include{content/40_design} \include{content/40_design}
\cleardoublepage
\include{content/50_implementation} \include{content/50_implementation}
\cleardoublepage
\include{content/60_evaluation} \include{content/60_evaluation}
\cleardoublepage
\include{content/70_conclusion} \include{content/70_conclusion}
\cleardoublepage
\appendix \appendix

2
thesis/content/00_title.tex

@ -24,5 +24,3 @@
\maketitle \maketitle
\end{singlespace} \end{singlespace}
\cleardoublepage

3
thesis/content/10_introduction.tex

@ -12,11 +12,8 @@
% den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr % den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr
% als 10 Seiten liest keiner. % als 10 Seiten liest keiner.
\todo{write this chapter} \todo{write this chapter}
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:

8
thesis/content/30_performance.tex

@ -9,9 +9,9 @@ mention article by reese cooper here
\begin{figure}[h] \begin{figure}[h]
\centering \centering
\includegraphics[width=1.0\textwidth]{images/structo-benchmark.png}
\caption{Throughput for different Submission Methods and Sizes}
\label{fig:perf-submitmethod}
\includegraphics[width=0.9\textwidth]{images/structo-benchmark.png}
\caption{Benchmark Procedure Pseudo-Code}
\label{fig:benchmark-function}
\end{figure} \end{figure}
\todo{split graphic into multiple parts for the three submission types} \todo{split graphic into multiple parts for the three submission types}
@ -74,8 +74,6 @@ Another limitation may be observed in this result, namely the inherent throughpu
\item lower utilization of dsa is good when it will be shared between threads/processes \item lower utilization of dsa is good when it will be shared between threads/processes
\end{itemize} \end{itemize}
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:

2
thesis/content/40_design.tex

@ -62,8 +62,6 @@ Due to its reliance on libnuma for memory allocation and thread pinning, \texttt
Compared with the challenges of ensuring correct entry lifetime and thread safety, the application of \gls{dsa} for the task of duplicating data is simple, thanks partly to \gls{intel:dml} \cite{intel:dmldoc}. Upon a call to \texttt{Cache::Access} and determining that the given memory pointer is not present in cache, work will be submitted to the Accelerator. Before, however, the desired location must be determined which the user-defined cache placement policy function handles. With the desired placement obtained, the copy policy then determines, which nodes should take part in the copy operation which is equivalent to selecting the Accelerators following \ref{subsection:dsa-hwarch}. This causes the work to be split upon the available accelerators to which the work descriptors are submitted at this time. The handlers that \gls{intel:dml} \cite{intel:dmldoc} provides will then be moved to the \texttt{CacheData} instance to permit the callee to wait upon caching completion. As the choice of cache placement and copy policy is user-defined, one possibility will be discussed in \ref{chap:implementation}. \par Compared with the challenges of ensuring correct entry lifetime and thread safety, the application of \gls{dsa} for the task of duplicating data is simple, thanks partly to \gls{intel:dml} \cite{intel:dmldoc}. Upon a call to \texttt{Cache::Access} and determining that the given memory pointer is not present in cache, work will be submitted to the Accelerator. Before, however, the desired location must be determined which the user-defined cache placement policy function handles. With the desired placement obtained, the copy policy then determines, which nodes should take part in the copy operation which is equivalent to selecting the Accelerators following \ref{subsection:dsa-hwarch}. This causes the work to be split upon the available accelerators to which the work descriptors are submitted at this time. The handlers that \gls{intel:dml} \cite{intel:dmldoc} provides will then be moved to the \texttt{CacheData} instance to permit the callee to wait upon caching completion. As the choice of cache placement and copy policy is user-defined, one possibility will be discussed in \ref{chap:implementation}. \par
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:

2
thesis/content/50_implementation.tex

@ -72,8 +72,6 @@ With the distributed locking described in \ref{subsec:implementation:cache-state
After \ref{subsec:implementation:accel-usage} the implementation of \texttt{Cache} provided leaves it up to the user to choose a caching and copy method policy which is accomplished through submitting function pointers at initialization of the \texttt{Cache}. In \ref{sec:state:setup-and-config} we configured our system to have separate \gls{numa:node}s for accessing \gls{hbm} which are assigned a \gls{numa:node}-ID by adding eight to the \gls{numa:node}s ID of the \gls{numa:node} that physically contains the \gls{hbm}. Therefore, given \gls{numa:node} 3 accesses some datum, the most efficient placement for the copy would be on \gls{numa:node} \(3 + 8 == 11\). As the \texttt{Cache} is intended for multithreaded usage, conserving accelerator resources is important, so that concurrent cache requests complete quickly. To get high per-copy performance while maintaining low usage, the smart-copy method is selected as described in \ref{sec:perf:datacopy} for larger copies, while small copies will be handled exclusively by the current node. This distinction is made due to the overhead of assigning the current thread to the selected nodes, which is required as \gls{intel:dml} assigns submissions only to the \gls{dsa} engine present on the node of the calling thread \cite[Section "NUMA support"]{intel:dmldoc}. No testing has taken place to evaluate this overhead and determine the most effective threshold. After \ref{subsec:implementation:accel-usage} the implementation of \texttt{Cache} provided leaves it up to the user to choose a caching and copy method policy which is accomplished through submitting function pointers at initialization of the \texttt{Cache}. In \ref{sec:state:setup-and-config} we configured our system to have separate \gls{numa:node}s for accessing \gls{hbm} which are assigned a \gls{numa:node}-ID by adding eight to the \gls{numa:node}s ID of the \gls{numa:node} that physically contains the \gls{hbm}. Therefore, given \gls{numa:node} 3 accesses some datum, the most efficient placement for the copy would be on \gls{numa:node} \(3 + 8 == 11\). As the \texttt{Cache} is intended for multithreaded usage, conserving accelerator resources is important, so that concurrent cache requests complete quickly. To get high per-copy performance while maintaining low usage, the smart-copy method is selected as described in \ref{sec:perf:datacopy} for larger copies, while small copies will be handled exclusively by the current node. This distinction is made due to the overhead of assigning the current thread to the selected nodes, which is required as \gls{intel:dml} assigns submissions only to the \gls{dsa} engine present on the node of the calling thread \cite[Section "NUMA support"]{intel:dmldoc}. No testing has taken place to evaluate this overhead and determine the most effective threshold.
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:

2
thesis/content/60_evaluation.tex

@ -14,8 +14,6 @@
\todo{write this chapter} \todo{write this chapter}
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:

2
thesis/content/70_conclusion.tex

@ -33,8 +33,6 @@
\item extend the cache implementation use cases where data is not static \item extend the cache implementation use cases where data is not static
\end{itemize} \end{itemize}
\cleardoublepage
%%% Local Variables: %%% Local Variables:
%%% TeX-master: "diplom" %%% TeX-master: "diplom"
%%% End: %%% End:
Loading…
Cancel
Save