Introduce DSA from a high level. Possibly use points from the Overview-Chapter in DSA Architecture Spec.
\blockquote{Intel \gls{dsa} is a high-performance data copy and transformation accelerator that will be integrated in future Intel® processors, targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications. \cite[15]{intel:dsaspec}}
Introduced with the 4th generation of Intel Processors \cite{intel:xeonbrief}, the Intel Data Streaming Accelerator promises to alleviate the CPU from \enquote{common storage functions and operations such as data integrity checks and deduplication}\cite{intel:xeonbrief}. This chapter will give an overview of the architecture, software and the interaction of these two components. The reader will be familiarized with the setup and equipped with the knowledge to configure the system for a specific use case.
\todo{consider adding projected use cases as in the architecture specification here}
\section{Intel DSA Architecture}
Fine-Grained Architecture Overview. Possibly use graphs from the DSA Arch Spec.
To be able to optimally utilize the Hardware, knowledge of its workings is required to make educated decisions. Therefore, this section describes both the workings of the \gls{dsa} engine itself (referred to as internal architecture) and the way it integrates with the rest of the processor (external architecture). All statements are based on Chapter 3 of the Architecture Specification by Intel \cite{intel:dsaspec}.
As the accelerator is directly integrated into the CPU, a system with multiple processors, as it is common in servers, will also have multiple \gls{dsa}s. These engines are accessible via the CPUs IO-Fabric as a PCIe device, and submit memory requests through this BUS directly to the \gls{iommu}. Configuration of the device on a low level is done through memory-mapped I/O registers that are set in the \gls{bar}, which is also used to set the location of work submission portals. Through these portals, the so-called work descriptors are handed over to the device for processing.
\begin{itemize}
\itemDSA is directly linked to the IOMMU to submit memory requests \cite[22]{intel:dsaspec}
\itemDSA is controlled over commands sent over the IO Fabric and is available as PCIe-like-device \cite[21]{intel:dsaspec}
\item\gls{dsa} is directly linked to the IOMMU to submit memory requests \cite[22]{intel:dsaspec}
\item\gls{dsa} is controlled over commands sent over the IO Fabric and is available as PCIe-like-device \cite[21]{intel:dsaspec}
\item configuration is done via Memory-Mapped IO registers configured through PCIe bar registers \cite[21]{intel:dsaspec}
\item possibly more performance with multiple engines per group (and single WQ) to cover over high latency address translation \cite[25]{intel:dsaspec}
\item drain descriptor / drain command signals completion of preceding descriptors for fencing in non-batch submissions, in batches the ``fence flag'` can be used to ensure ordering, failures before a fence will lead to the following descriptors being aborted \cite[30]{intel:dsaspec}, \texttt{sfence} or \texttt{mfence} should be executed before pushing drain descriptor \cite[32]{intel:dsaspec}
@ -59,13 +65,17 @@ Setup Requirements:
\item VT-d enabled
\item limit CPUPA to 46 Bits disabled
\item IOMMU enabled
\item kernel with iommu and DSA support
\item kernel with iommu and \gls{dsa} support
\item kernel option "intel\_iommu=on,sm\_on"
\end{itemize}
Software Configuration:
Describe intel accel-config and how it works with back reference to architecture.
Software Access:
Explain how a piece of software may access the \gls{dsa}/WQ, how the drivers and dsa libraries enable this