diff --git a/thesis/bachelor.pdf b/thesis/bachelor.pdf index dc4fbee..18dfc0b 100644 Binary files a/thesis/bachelor.pdf and b/thesis/bachelor.pdf differ diff --git a/thesis/content/20_state.tex b/thesis/content/20_state.tex index a98ba78..de7fcaf 100644 --- a/thesis/content/20_state.tex +++ b/thesis/content/20_state.tex @@ -50,22 +50,22 @@ Introduced with the \(4^{th}\) generation of Intel Xeon Scalable Processors, the \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/dsa-internal-block-diagram.png} - \caption{DSA Internal Architecture \cite[Fig. 1 (a)]{intel:analysis}} + \includegraphics[width=0.9\textwidth]{images/block-dsa-hwarch.pdf} + \caption{DSA Internal Architecture \cite[Fig. 1 (a)]{intel:analysis}. Shows the components that the chip is made up of, how they are connected and which outside components the DSA communicates with.} \label{fig:dsa-internal-block} \end{figure} The \gls{dsa} chip is directly integrated into the processor and attaches via the I/O fabric interface over which all communication is conducted. Through this interface, it is accessible as a PCIe device. Therefore, configuration utilizes memory-mapped registers set in the devices \gls{bar}. Through these, the devices' layout is defined and memory pages for work submission set. In a system with multiple processing nodes, there may also be one \gls{dsa} per node, resulting in 4 being present on the previously mentioned Xeon Max CPU. \todo{add citations to this section} \par -To satisfy different use cases, the layout of the \gls{dsa} may be software-defined. The structure is made up of three components, namely \gls{dsa:wq}, \gls{dsa:engine} and \gls{dsa:group}. \gls{dsa:wq}s provide the means to submit tasks to the device and will be described in more detail shortly. They are marked yellow in Figure \ref{fig:dsa-internal-block}. An \gls{dsa:engine} is the processing-block that connects to memory and performs the described task. The grey block of Figure \ref{fig:dsa-internal-block} shows the subcomponents that make up an engine and the different internal paths for a batch or task descriptor \todo{too much detail for this being the first overview paragraph}. Using \gls{dsa:group}s, \gls{dsa:engine}s and \gls{dsa:wq}s are tied together, indicated by the dotted blue line around the components of Group 0 in Figure \ref{fig:dsa-internal-block}. This means, that tasks from one \gls{dsa:wq} may be processed from multiple \gls{dsa:engine}s and vice-versa, depending on the configuration. This flexibility is achieved through the Group Arbiter, represented by the orange block in Figure \ref{fig:dsa-internal-block}, which connects the two components according to the user-defined configuration. \par +To satisfy different use cases, the layout of the \gls{dsa} may be software-defined. The structure is made up of three components, namely \gls{dsa:wq}, Engine and Group. \gls{dsa:wq}s provide the means to submit tasks to the device and will be described in more detail shortly. They are marked yellow in Figure \ref{fig:dsa-internal-block}. An Engine is the processing-block that connects to memory and performs the described task. The grey block of Figure \ref{fig:dsa-internal-block} shows the subcomponents that make up an engine and the different internal paths for a batch or task descriptor \todo{too much detail for this being the first overview paragraph}. Using Groups, Engines and \gls{dsa:wq}s are tied together, indicated by the dotted blue line around the components of Group 0 in Figure \ref{fig:dsa-internal-block}. This means, that tasks from one \gls{dsa:wq} may be processed from multiple Engines and vice-versa, depending on the configuration. This flexibility is achieved through the Group Arbiter, represented by the orange block in Figure \ref{fig:dsa-internal-block}, which connects the two components according to the user-defined configuration. \par A \gls{dsa:wq} is accessible through so-called portals, light blue in Figure \ref{fig:dsa-internal-block}, which are mapped memory regions. Submission of work is done by writing a descriptor to one of these. A descriptor is 64 bytes in size and may contain one specific task (task descriptor) or the location of a task array in memory (batch descriptor). Through these portals, the submitted descriptor reaches a queue. There are two possible queue types with different submission methods and use cases. The \gls{dsa:swq} is intended to provide synchronized access to multiple processes and each group may only have one attached. A \gls{pcie-dmr}, which guarantees implicit synchronization, is generated via \gls{x86:enqcmd} and communicates with the device before writing \cite[Sec. 3.3.1]{intel:dsaspec}. This may result in higher submission cost, compared to the \gls{dsa:dwq} to which a descriptor is submitted via \gls{x86:movdir64b} \cite[Sec. 3.3.2]{intel:dsaspec}. \par -To handle the different descriptors, each \gls{dsa:engine} has two internal execution paths. One for a task and the other for a batch descriptor. Processing a task descriptor is straightforward, as all information required to complete the operation are contained within \todo{cite this}. For a batch, the \gls{dsa} reads the batch descriptor, then fetches all task descriptors from memory and processes them \cite[Sec. 3.8]{intel:dsaspec}. An \gls{dsa:engine} can coordinate with the operating system in case it encounters a page fault, waiting on its resolution, if configured to do so, while otherwise, an error will be generated in this scenario \cite[Sec. 2.2, Block on Fault]{intel:dsaspec}. \par +To handle the different descriptors, each Engine has two internal execution paths. One for a task and the other for a batch descriptor. Processing a task descriptor is straightforward, as all information required to complete the operation are contained within \todo{cite this}. For a batch, the \gls{dsa} reads the batch descriptor, then fetches all task descriptors from memory and processes them \cite[Sec. 3.8]{intel:dsaspec}. An Engine can coordinate with the operating system in case it encounters a page fault, waiting on its resolution, if configured to do so, while otherwise, an error will be generated in this scenario \cite[Sec. 2.2, Block on Fault]{intel:dsaspec}. \par -Ordering of operations is only guaranteed for a configuration with one \gls{dsa:wq} and one \gls{dsa:engine} in a \gls{dsa:group} when submitting exclusively batch or task descriptors but no mixture. Even then, only write-ordering is guaranteed, meaning that \enquote{reads by a subsequent descriptor can pass writes from a previous descriptor}. A different issue arises, when an operation fails, as the \gls{dsa} will continue to process the following descriptors from the queue. Care must therefore be taken with read-after-write scenarios, either by waiting for a successful completion before submitting the dependant, inserting a drain descriptor for tasks or setting the fence flag for a batch. The latter two methods tell the processing engine that all writes must be committed and, in case of the fence in a batch, abort on previous error. \cite[Sec. 3.9]{intel:dsaspec} \par +Ordering of operations is only guaranteed for a configuration with one \gls{dsa:wq} and one Engine in a Group when submitting exclusively batch or task descriptors but no mixture. Even then, only write-ordering is guaranteed, meaning that \enquote{reads by a subsequent descriptor can pass writes from a previous descriptor}. A different issue arises, when an operation fails, as the \gls{dsa} will continue to process the following descriptors from the queue. Care must therefore be taken with read-after-write scenarios, either by waiting for a successful completion before submitting the dependant, inserting a drain descriptor for tasks or setting the fence flag for a batch. The latter two methods tell the processing engine that all writes must be committed and, in case of the fence in a batch, abort on previous error. \cite[Sec. 3.9]{intel:dsaspec} \par -An important aspect of modern computer systems is the separation of address spaces through virtual memory. Therefore, the \gls{dsa} must handle address translation, as a process submitting a task will not know the physical location in memory which causes the descriptor to contain virtual values. For this, the \gls{dsa:engine} communicates with the \gls{iommu} and \gls{atc} to perform this operation, as visible in the outward connections at the top of Figure \ref{fig:dsa-internal-block}. For this, knowledge about the submitting processes is required, and therefore each task descriptor has a field for the \gls{x86:pasid} which is filled by the \gls{x86:enqcmd} instruction for a \gls{dsa:swq} \cite[Sec. 3.3.1]{intel:dsaspec} or set statically after a process is attached to a \gls{dsa:dwq} \cite[Sec. 3.3.2]{intel:dsaspec}. \par +An important aspect of modern computer systems is the separation of address spaces through virtual memory. Therefore, the \gls{dsa} must handle address translation, as a process submitting a task will not know the physical location in memory which causes the descriptor to contain virtual values. For this, the Engine communicates with the \gls{iommu} and \gls{atc} to perform this operation, as visible in the outward connections at the top of Figure \ref{fig:dsa-internal-block}. For this, knowledge about the submitting processes is required, and therefore each task descriptor has a field for the \gls{x86:pasid} which is filled by the \gls{x86:enqcmd} instruction for a \gls{dsa:swq} \cite[Sec. 3.3.1]{intel:dsaspec} or set statically after a process is attached to a \gls{dsa:dwq} \cite[Sec. 3.3.2]{intel:dsaspec}. \par The completion of a descriptor may be signalled through a completion record and interrupt, if configured so. For this, the \gls{dsa} \enquote{provides two types of interrupt message storage: (1) an MSI-X table, enumerated through the MSI-X capability; and (2) a device-specific Interrupt Message Storage (IMS) table} \cite[Sec. 3.7]{intel:dsaspec}. \par @@ -74,8 +74,8 @@ The completion of a descriptor may be signalled through a completion record and \begin{figure}[h] \centering - \includegraphics[width=0.5\textwidth]{images/dsa-software-architecture.png} - \caption{DSA Software View \cite[Fig. 1 (b)]{intel:analysis}} + \includegraphics[width=0.5\textwidth]{images/block-dsa-swarch.pdf} + \caption{DSA Software View \cite[Fig. 1 (b)]{intel:analysis}. Illustrating the software stack and internal interactions from user applications, through the driver to the portal for work submission.} \label{fig:dsa-software-arch} \end{figure} @@ -92,8 +92,8 @@ As mentioned in Subsection \ref{subsec:state:dsa-software-view}, \gls{intel:dml} \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/structo-dmlmemcpy.png} - \caption{DML Memcpy Implementation Pseudocode} + \includegraphics[width=0.9\textwidth]{images/nsd-dsamemcpy.pdf} + \caption{DML Memcpy Implementation Pseudocode. Performs copy operation of a block of memory from source to destination. The DSA executing this copy can be selected with the parameter \texttt{node}, and the template parameter \texttt{path} elects whether to run on hardware (DSA) or software (CPU).} \label{fig:dml-memcpy} \end{figure} @@ -120,8 +120,6 @@ In this section we will give a brief step-by-step list of setup instructions to \item Inspect the now configured \gls{dsa} devices using \texttt{accel-config list} \cite[p. 7]{lenovo:dsa}, output should match the desired configuration set in the file used \end{enumerate} -\todo{is this enumeration acceptable?} - %%% Local Variables: %%% TeX-master: "diplom" %%% End: diff --git a/thesis/content/30_performance.tex b/thesis/content/30_performance.tex index 955c400..9c87c3c 100644 --- a/thesis/content/30_performance.tex +++ b/thesis/content/30_performance.tex @@ -7,8 +7,8 @@ The performance of \gls{dsa} has been evaluated in great detail by Reese Kuper e \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/structo-benchmark-compact.png} - \caption{Benchmark Procedure Pseudocode} + \includegraphics[width=0.9\textwidth]{images/nsd-benchmark.pdf} + \caption{Benchmark Procedure Pseudocode. Timing marked with yellow background. Showing data allocation and the benchmarking loop for a single thread.} \label{fig:benchmark-function} \end{figure} @@ -18,7 +18,7 @@ The benchmark performs node setup as described in Section \ref{sec:state:dml} an \section{Benchmarks} -In this section we will describe three benchmarks. Each complete with setup information and preview, followed by plots showing the results and detailed analysis. We formulate expectations and contrast them with the observations from our measurements. \par +In this section we will describe three benchmarks. Each complete with setup information and preview, followed by plots showing the results and detailed analysis. We formulate expectations and contrast them with the observations from our measurements. Where displayed, the slim grey bars represent the standard deviation across iterations. \par \subsection{Submission Method} \label{subsec:perf:submitmethod} @@ -27,8 +27,8 @@ With each submission, descriptors must be prepared and sent off to the underlyin \begin{figure}[h] \centering - \includegraphics[width=0.7\textwidth]{images/plot-opt-submitmethod.png} - \caption{Throughput for different Submission Methods and Sizes} + \includegraphics[width=0.7\textwidth]{images/plot-opt-submitmethod.pdf} + \caption{Throughput for different Submission Methods and Sizes. Performing a copy with source and destination being node 0, executed by the DSA on node 0. Observable is the submission cost which affects small transfer sizes differently, as there the completion time is lower.} \label{fig:perf-submitmethod} \end{figure} @@ -43,8 +43,8 @@ As we might encounter access to one \gls{dsa} from multiple threads through the \begin{figure}[h] \centering - \includegraphics[width=0.7\textwidth]{images/plot-perf-mtsubmit.png} - \caption{Throughput for different Thread Counts and Sizes} + \includegraphics[width=0.7\textwidth]{images/plot-perf-mtsubmit.pdf} + \caption{Throughput for different Thread Counts and Sizes. Multiple threads submit to the same Shared Work Queue. Performing a copy with source and destination being node 0, executed by the DSA on node 0.} \label{fig:perf-mtsubmit} \end{figure} @@ -60,7 +60,7 @@ Two methods of splitting will be evaluated. The first being a brute force approa \begin{figure}[h] \centering \includegraphics[width=0.9\textwidth]{images/xeonmax-numa-layout.png} - \caption{Xeon Max NUMA-Node Layout \cite[Fig. 14]{intel:maxtuning}} + \caption{Xeon Max NUMA-Node Layout \cite[Fig. 14]{intel:maxtuning} for a 2-Socket System when configured with HBM-Flat, showing separate Node IDs for manual HBM access and for Cores and DDR memory} \label{fig:perf-xeonmaxnuma} \end{figure} @@ -68,15 +68,15 @@ For this benchmark, we copy 1 Gibibyte of data from node 0 to the destination no \begin{figure}[h] \centering - \includegraphics[width=0.7\textwidth]{images/plot-perf-allnodes-throughput-selectbarplot.png} - \caption{Throughput for brute force copy from DDR to HBM} + \includegraphics[width=0.7\textwidth]{images/plot-perf-allnodes-throughput-selectbarplot.pdf} + \caption{Throughput for brute force copy from DDR to HBM, using all available DSA, copying 1 GiB from Node 0 to the Destination Node specified on the x-axis} \label{fig:perf-peak-brute} \end{figure} \begin{figure}[h] \centering - \includegraphics[width=0.7\textwidth]{images/plot-perf-smart-throughput-selectbarplot.png} - \caption{Throughput for smart copy from DDR to HBM} + \includegraphics[width=0.7\textwidth]{images/plot-perf-smart-throughput-selectbarplot.pdf} + \caption{Throughput for smart copy from DDR to HBM, using four on-socket DSA for intra-socket operation and the DSA on source and destination node for inter-socket, copying 1 GiB from Node 0 to the Destination Node specified on the x-axis } \label{fig:perf-peak-smart} \end{figure} @@ -96,8 +96,6 @@ In this section we summarize the conclusions drawn from the three benchmarks per \item In \ref{subsec:perf:datacopy}, we chose to use the presented smart copy methodology to split copy tasks across multiple \gls{dsa} chips to achieve low utilization with acceptable performance. \end{enumerate} -\todo{is this enumeration acceptable?} - \todo{compare cpu and dsa for data movement} %%% Local Variables: diff --git a/thesis/content/40_design.tex b/thesis/content/40_design.tex index 3201bc8..37db00c 100644 --- a/thesis/content/40_design.tex +++ b/thesis/content/40_design.tex @@ -36,13 +36,11 @@ The task of prefetching is somewhat aligned with that of a cache. As a cache is \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/design-classdiagram.png} - \caption{Public Interface of CacheData and Cache Classes} + \includegraphics[width=0.9\textwidth]{images/uml-cache-and-cachedata.pdf} + \caption{Public Interface of CacheData and Cache Classes. Colour coding for thread safety. Grey denotes impossibility for threaded access. Green indicates full safety guarantees only relying on atomics to achieve this. Yellow may use locking but is still safe for use. Red must be called from a single threaded context.} \label{fig:impl-design-interface} \end{figure} -For reference, the public interface which we will develop throughout this section is visualized in Figure \ref{fig:impl-design-interface} for both classes created. Colour coding signals thread safety where grey denotes impossibility for threaded access. Green indicates full safety guarantees only relying on atomics to achieve this. Turquoise functions use locking mechanisms to achieve thread safety. Operations in yellow may observe threading effects from atomics but are still inherently safe to call. Finally, red markers indicate unsafe functions which must be called from a single threaded context. As we implement these classes in C++ in Chapter \ref{chap:implementation}, we also utilize C++-Notation for functions in the figure here.\par - \subsection{Interface} @@ -52,7 +50,7 @@ As caching is performed asynchronously, the user may wish to wait on the operati \subsection{Cache Entry Reuse} \label{subsec:design:cache-entry-reuse} -When multiple consumers wish to access the same memory block through the \texttt{Cache}, we could either provide each with their own entry, or share one entry for all consumers. The first option may cause high load on the accelerator due to multiple copy operations being submitted and also increases the memory footprint of the system. The latter option requires synchronization and more complex design. As the cache size is restrictive, the latter was chosen. The already existing \texttt{CacheData} will be extended in scope to handle this by allowing copies of it to be created which must synchronize with each other for \texttt{CacheData::WaitOnCompletion} and \texttt{CacheData::GetDataLocation}. \par +When multiple consumers wish to access the same memory block through the \texttt{Cache}, we could either provide each with their own entry, or share one entry for all consumers. The first option may cause high load on the accelerator due to multiple copy operations being submitted and also increases the memory footprint of the system. The latter option requires synchronization and more complex design. As the cache size is restrictive, the latter was chosen. The already existing \texttt{CacheData} will be extended in scope to handle this by allowing copies of it to be created which must synchronize with each other for \texttt{CacheData::WaitOnCompletion} and \texttt{CacheData::GetDataLocation}. This is shown by the green markings, signalling thread safety guarantees for access in Figure \ref{fig:impl-design-interface}. \par \subsection{Cache Entry Lifetime} \label{subsec:design:cache-entry-lifetime} diff --git a/thesis/content/50_implementation.tex b/thesis/content/50_implementation.tex index a60526c..2925405 100644 --- a/thesis/content/50_implementation.tex +++ b/thesis/content/50_implementation.tex @@ -42,8 +42,8 @@ It was therefore decided to implement atomic reference counting for \texttt{Cach \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/sequenzdiagramm-waitoncompletion.png} - \caption{Sequence for Blocking Scenario} + \includegraphics[width=0.9\textwidth]{images/seq-blocking-wait.pdf} + \caption{Sequence for Blocking Scenario. Observable in first draft implementation. Scenario where \(T_1\) performed first access to a datum followed \(T_2\) and \(T_3\). Then \(T_1\) holds the handlers exclusively, leading to the other threads having to wait for \(T_1\) to perform the work submission and waiting before they can access the datum through the cache.} \label{fig:impl-cachedata-threadseq-waitoncompletion} \end{figure} @@ -51,8 +51,8 @@ Designing the wait to work from any thread was complicated. In the first impleme \begin{figure}[h] \centering - \includegraphics[width=0.9\textwidth]{images/structo-cachedata-waitoncompletion.png} - \caption{\texttt{CacheData::WaitOnCompletion} Pseudocode} + \includegraphics[width=0.9\textwidth]{images/nsd-cachedata-waitoncompletion.pdf} + \caption{\texttt{CacheData::WaitOnCompletion} Pseudocode. Final rendition of the implementation for a fair wait function.} \label{fig:impl-cachedata-waitoncompletion} \end{figure} diff --git a/thesis/images/design-classdiagram.png b/thesis/images/design-classdiagram.png deleted file mode 100644 index c42f896..0000000 Binary files a/thesis/images/design-classdiagram.png and /dev/null differ diff --git a/thesis/images/dsa-internal-block-diagram.png b/thesis/images/dsa-internal-block-diagram.png deleted file mode 100644 index 1d14667..0000000 Binary files a/thesis/images/dsa-internal-block-diagram.png and /dev/null differ diff --git a/thesis/images/dsa-software-architecture.png b/thesis/images/dsa-software-architecture.png deleted file mode 100644 index 7fc4d77..0000000 Binary files a/thesis/images/dsa-software-architecture.png and /dev/null differ diff --git a/thesis/images/design-classdiagram.xml b/thesis/images/image-source/design-classdiagram.xml similarity index 68% rename from thesis/images/design-classdiagram.xml rename to thesis/images/image-source/design-classdiagram.xml index f70f6ab..e53343f 100644 --- a/thesis/images/design-classdiagram.xml +++ b/thesis/images/image-source/design-classdiagram.xml @@ -1,79 +1,79 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + diff --git a/thesis/images/sequenzdiagramm-waitoncompletion.xml b/thesis/images/image-source/sequenzdiagramm-waitoncompletion.xml similarity index 71% rename from thesis/images/sequenzdiagramm-waitoncompletion.xml rename to thesis/images/image-source/sequenzdiagramm-waitoncompletion.xml index 4f989c7..95b6fd4 100644 --- a/thesis/images/sequenzdiagramm-waitoncompletion.xml +++ b/thesis/images/image-source/sequenzdiagramm-waitoncompletion.xml @@ -1,142 +1,142 @@ - + - + - + - + - + - + - + - + - - + + - + - + - - + + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - + - - + + - + - + - + - + - - + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + diff --git a/thesis/images/structo-benchmark-compact.nsd b/thesis/images/image-source/structo-benchmark.nsd similarity index 91% rename from thesis/images/structo-benchmark-compact.nsd rename to thesis/images/image-source/structo-benchmark.nsd index 1690ef5..6669e6c 100644 --- a/thesis/images/structo-benchmark-compact.nsd +++ b/thesis/images/image-source/structo-benchmark.nsd @@ -4,10 +4,10 @@ - + @@ -28,8 +28,8 @@ + - \ No newline at end of file diff --git a/thesis/images/structo-cachedata-waitoncompletion.nsd b/thesis/images/image-source/structo-cachedata-waitoncompletion.nsd similarity index 100% rename from thesis/images/structo-cachedata-waitoncompletion.nsd rename to thesis/images/image-source/structo-cachedata-waitoncompletion.nsd diff --git a/thesis/images/structo-dmlmemcpy.nsd b/thesis/images/image-source/structo-dmlmemcpy.nsd similarity index 100% rename from thesis/images/structo-dmlmemcpy.nsd rename to thesis/images/image-source/structo-dmlmemcpy.nsd diff --git a/thesis/images/plot-opt-submitmethod.png b/thesis/images/plot-opt-submitmethod.png deleted file mode 100644 index b315e01..0000000 Binary files a/thesis/images/plot-opt-submitmethod.png and /dev/null differ diff --git a/thesis/images/plot-perf-allnodes-throughput-selectbarplot.png b/thesis/images/plot-perf-allnodes-throughput-selectbarplot.png deleted file mode 100644 index 1ff7835..0000000 Binary files a/thesis/images/plot-perf-allnodes-throughput-selectbarplot.png and /dev/null differ diff --git a/thesis/images/plot-perf-mtsubmit.png b/thesis/images/plot-perf-mtsubmit.png deleted file mode 100644 index aefc710..0000000 Binary files a/thesis/images/plot-perf-mtsubmit.png and /dev/null differ diff --git a/thesis/images/plot-perf-smart-throughput-selectbarplot.png b/thesis/images/plot-perf-smart-throughput-selectbarplot.png deleted file mode 100644 index df7fa22..0000000 Binary files a/thesis/images/plot-perf-smart-throughput-selectbarplot.png and /dev/null differ diff --git a/thesis/images/sequenzdiagramm-waitoncompletion.png b/thesis/images/sequenzdiagramm-waitoncompletion.png deleted file mode 100644 index 58dd4db..0000000 Binary files a/thesis/images/sequenzdiagramm-waitoncompletion.png and /dev/null differ diff --git a/thesis/images/structo-benchmark-compact.png b/thesis/images/structo-benchmark-compact.png deleted file mode 100644 index a346cfc..0000000 Binary files a/thesis/images/structo-benchmark-compact.png and /dev/null differ diff --git a/thesis/images/structo-benchmark.nsd b/thesis/images/structo-benchmark.nsd deleted file mode 100644 index a242453..0000000 --- a/thesis/images/structo-benchmark.nsd +++ /dev/null @@ -1,57 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/thesis/images/structo-benchmark.png b/thesis/images/structo-benchmark.png deleted file mode 100644 index 89235b9..0000000 Binary files a/thesis/images/structo-benchmark.png and /dev/null differ diff --git a/thesis/images/structo-cachedata-waitoncompletion.png b/thesis/images/structo-cachedata-waitoncompletion.png deleted file mode 100644 index 387e6b6..0000000 Binary files a/thesis/images/structo-cachedata-waitoncompletion.png and /dev/null differ diff --git a/thesis/images/structo-dmlmemcpy.png b/thesis/images/structo-dmlmemcpy.png deleted file mode 100644 index dea8188..0000000 Binary files a/thesis/images/structo-dmlmemcpy.png and /dev/null differ diff --git a/thesis/own.gls b/thesis/own.gls index 399d4c3..4b250b8 100644 --- a/thesis/own.gls +++ b/thesis/own.gls @@ -54,20 +54,6 @@ description={... desc ...} } -\newglossaryentry{dsa:engine}{ - name={Engine}, - long={\gls{dsa} Engine}, - first={Engine}, - description={... desc ...} -} - -\newglossaryentry{dsa:group}{ - name={Group}, - long={\gls{dsa} Group}, - first={Group}, - description={... desc ...} -} - \newglossaryentry{x86:enqcmd}{ name={ENQCMD}, long={x86 Instruction ENQCMD},