You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
85 lines
5.6 KiB
85 lines
5.6 KiB
\chapter{Technical Background on Intel DSA}
|
|
\label{sec:state}
|
|
|
|
% Hier werden zwei wesentliche Aufgaben erledigt:
|
|
|
|
% 1. Der Leser muß alles beigebracht bekommen, was er zum Verständnis
|
|
% der späteren Kapitel braucht. Insbesondere sind in unserem Fach die
|
|
% Systemvoraussetzungen zu klären, die man später benutzt. Zulässig ist
|
|
% auch, daß man hier auf Tutorials oder Ähnliches verweist, die hier auf
|
|
% dem Netz zugänglich sind.
|
|
|
|
% 2. Es muß klar werden, was anderswo zu diesem Problem gearbeitet
|
|
% wird. Insbesondere sollen natürlich die Lücken der anderen klar
|
|
% werden. Warum ist die eigene Arbeit, der eigene Ansatz wichtig, um
|
|
% hier den Stand der Technik weiterzubringen? Dieses Kapitel wird von
|
|
% vielen Lesern übergangen (nicht aber vom Gutachter ;-), auch später
|
|
% bei Veröffentlichungen ist "Related Work" eine wichtige Sache.
|
|
|
|
% Viele Leser stellen dann später fest, daß sie einige der Grundlagen
|
|
% doch brauchen und blättern zurück. Deshalb ist es gut,
|
|
% Rückwärtsverweise in späteren Kapiteln zu haben, und zwar so, daß man
|
|
% die Abschnitte, auf die verwiesen wird, auch für sich lesen
|
|
% kann. Diese Kapitel kann relativ lang werden, je größer der Kontext
|
|
% der Arbeit, desto länger. Es lohnt sich auch! Den Text kann man unter
|
|
% Umständen wiederverwenden, indem man ihn als "Tutorial" zu einem
|
|
% Gebiet auch dem Netz zugänglich macht.
|
|
|
|
% Dadurch gewinnt man manchmal wertvolle Hinweise von Kollegen. Dieses
|
|
% Kapitel wird in der Regel zuerst geschrieben und ist das Einfachste
|
|
% (oder das Schwerste weil erste).
|
|
|
|
\blockquote{Intel \gls{dsa} is a high-performance data copy and transformation accelerator that will be integrated in future Intel® processors, targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications. \cite[15]{intel:dsaspec}}
|
|
|
|
Introduced with the 4th generation of Intel Xeon Scalable Processors \cite{intel:xeonbrief}, the \gls{dsa} promises to alleviate the CPU from \enquote{common storage functions and operations such as data integrity checks and deduplication} \cite{intel:xeonbrief}. This chapter will give an overview of the architecture, software and the interaction of these two components. The reader will be familiarized with the setup and equipped with the knowledge to configure the system for a specific use case.
|
|
|
|
\todo{consider adding projected use cases as in the architecture specification here}
|
|
|
|
\section{Architecture}
|
|
|
|
To be able to optimally utilize the Hardware, knowledge of its workings is required to make educated decisions. Therefore, this section describes both the workings of the \gls{dsa} engine itself (referred to as internal architecture) and the way it integrates with the rest of the processor (external architecture). All statements are based on Chapter 3 of the Architecture Specification by Intel \cite{intel:dsaspec}. \par
|
|
|
|
As the accelerator is directly integrated into the CPU, a system with multiple processors, as it is common in servers, will also have multiple \gls{dsa}s. These engines are accessible via the CPUs IO-Fabric as a PCIe device, and submit memory requests through this BUS directly to the \gls{iommu}. Configuration of the device on a low level is done through memory-mapped I/O registers that are set in the \gls{bar}, which is also used to set the location of work submission portals. Through these portals, the so-called work descriptors are handed over to the device for processing. \par
|
|
|
|
\begin{itemize}
|
|
\item possibly more performance with multiple engines per group (and single WQ) to cover over high latency address translation \cite[25]{intel:dsaspec}
|
|
\item drain descriptor / drain command signals completion of preceding descriptors for fencing in non-batch submissions, in batches the ``fence flag'` can be used to ensure ordering, failures before a fence will lead to the following descriptors being aborted \cite[30]{intel:dsaspec}, \texttt{sfence} or \texttt{mfence} should be executed before pushing drain descriptor \cite[32]{intel:dsaspec}
|
|
\item cache control flag in descriptor controls whether writes are directed to cache or to memory \cite[31]{intel:dsaspec} effects on copy from DRAM > HBM unknown
|
|
\item shared WQ receive work via 'PCIe deferrable memory write request' to the portal which removes the need for synchronization of submissions but can cost more due to the communication overhead of posting a write request and waiting for it to be signalled 'completed' \cite[23]{intel:dsaspec}
|
|
\item dedicated WQ are configured by the driver with a specified PASID for address translation and can not be shared by multiple clients \cite[24]{intel:dsaspec}
|
|
\end{itemize}
|
|
|
|
\section{HW/SW Setup}
|
|
|
|
Give the reader the tools to replicate the setup.
|
|
Also explain why the BIOS-configs are required.
|
|
|
|
Setup Requirements:
|
|
\begin{itemize}
|
|
\item VT-d enabled
|
|
\item limit CPUPA to 46 Bits disabled
|
|
\item IOMMU enabled
|
|
\item kernel with iommu and \gls{dsa} support
|
|
\item kernel option "intel\_iommu=on,sm\_on"
|
|
\end{itemize}
|
|
|
|
Software Configuration:
|
|
Describe intel accel-config and how it works with back reference to architecture.
|
|
|
|
Software Access:
|
|
Explain how a piece of software may access the \gls{dsa}/WQ, how the drivers and dsa libraries enable this
|
|
and also how access policies are enforced.
|
|
|
|
\section{Microbenchmarks}
|
|
|
|
\todo{provide microbenchmarks with multiple configurations and for many use cases}
|
|
|
|
\section{Evaluation}
|
|
|
|
\todo{evaluate the benchmarks and conclude with projected use cases - may use the cases from dsaspec/guide}
|
|
|
|
\cleardoublepage
|
|
|
|
%%% Local Variables:
|
|
%%% TeX-master: "diplom"
|
|
%%% End:
|