Browse Source

modify explanation of dsa-wq submission to not include so much detail on the actual instructions. remove these from the glossary too, improve gls for API

master
Constantin Fürst 3 months ago
parent
commit
714bc3c1ec
  1. 9
      thesis/content/20_state.tex
  2. 26
      thesis/own.gls

9
thesis/content/20_state.tex

@ -77,9 +77,7 @@ The \gls{dsa} chip is directly integrated into the processor and attaches via th
\subsubsection{Architectural Components}
\label{subsec:state:dsa-arch-comp}
\textsc{Component \rom{1}, \glsentrylong{dsa:wq}:} \glsentryshort{dsa:wq}s provide the means to submit tasks to the device and will be described in more detail shortly. They are marked yellow in Figure \ref{fig:dsa-internal-block}. A \gls{dsa:wq} is accessible through so-called portals, light blue in Figure \ref{fig:dsa-internal-block}, which are mapped memory regions. Submission of work is done by writing a descriptor to one of these. A descriptor is 64 bytes in size and may contain one specific task (task descriptor) or the location of a task array in memory (batch descriptor). Through these portals, the submitted descriptor reaches a queue. There are two possible queue types with different submission methods and use cases. The \gls{dsa:swq} is intended to provide synchronized access to multiple processes and each group may only have one attached. A \gls{pcie-dmr}, which guarantees implicit synchronization, is generated via \gls{x86:enqcmd} and communicates with the device before writing \cite[Sec. 3.3.1]{intel:dsaspec}. This may result in higher submission cost, compared to the \gls{dsa:dwq} to which a descriptor is submitted via \gls{x86:movdir64b} \cite[Sec. 3.3.2]{intel:dsaspec}. \par
\todo{potentially give less details on the instructions or reformulate this some other way}
\textsc{Component \rom{1}, \glsentrylong{dsa:wq}:} \glsentryshort{dsa:wq}s provide the means to submit tasks to the device and will be described in more detail shortly. They are marked yellow in Figure \ref{fig:dsa-internal-block}. A \gls{dsa:wq} is accessible through so-called portals, light blue in Figure \ref{fig:dsa-internal-block}, which are mapped memory regions. Submission of work is done by writing a descriptor to one of these. A descriptor is 64 bytes in size and may contain one specific task (task descriptor) or the location of a task array in memory (batch descriptor). Through these portals, the submitted descriptor reaches a queue. There are two possible queue types with different submission methods and use cases. The \gls{dsa:swq} is intended to provide synchronized access to multiple processes and each group may only have one attached. The method used to achieve this guarantee may result in higher submission cost \cite[Sec. 3.3.1]{intel:dsaspec}, compared to the \gls{dsa:dwq} to which a descriptor is submitted via a regular write \cite[Sec. 3.3.2]{intel:dsaspec}. \par
\textsc{Component \rom{2}, Engine:} An Engine is the processing-block that connects to memory and performs the described task. To handle the different descriptors, each Engine has two internal execution paths. One for a task and the other for a batch descriptor. Processing a task descriptor is straightforward, as all information required to complete the operation are contained within \cite[Sec. 3.2]{intel:dsaspec}. For a batch, the \gls{dsa} reads the batch descriptor, then fetches all task descriptors from memory and processes them \cite[Sec. 3.8]{intel:dsaspec}. An Engine can coordinate with the operating system in case it encounters a page fault, waiting on its resolution, if configured to do so, while otherwise, an error will be generated in this scenario \cite[Sec. 2.2, Block on Fault]{intel:dsaspec}. \par
@ -88,7 +86,7 @@ The \gls{dsa} chip is directly integrated into the processor and attaches via th
\subsubsection{Virtual Address Resolution}
\label{subsubsec:state:dsa-vaddr}
An important aspect of computer systems is the abstraction of physical memory addresses through virtual memory \cite{virtual-memory}. Therefore, the \gls{dsa} must handle address translation because a process submitting a task will not know the physical location in memory of its data, causing the descriptor to contain virtual addresses. To resolve these to physical addresses, the Engine communicates with the \gls{iommu} to perform this operation, as visible in the outward connections at the top of Figure \ref{fig:dsa-internal-block}. Knowledge about the submitting processes is required for this resolution. Therefore, each task descriptor has a field for the \gls{x86:pasid} which is filled by the \gls{x86:enqcmd} instruction for a \gls{dsa:swq} \cite[Sec. 3.3.1]{intel:dsaspec} or set statically after a process is attached to a \gls{dsa:dwq} \cite[Sec. 3.3.2]{intel:dsaspec}. \par
An important aspect of computer systems is the abstraction of physical memory addresses through virtual memory \cite{virtual-memory}. Therefore, the \gls{dsa} must handle address translation because a process submitting a task will not know the physical location in memory of its data, causing the descriptor to contain virtual addresses. To resolve these to physical addresses, the Engine communicates with the \gls{iommu} to perform this operation, as visible in the outward connections at the top of Figure \ref{fig:dsa-internal-block}. Knowledge about the submitting processes is required for this resolution. Therefore, each task descriptor has a field for the \gls{x86:pasid} which is filled by the instruction used by \gls{dsa:swq} submission \cite[Sec. 3.3.1]{intel:dsaspec} or set statically after a process is attached to a \gls{dsa:dwq} \cite[Sec. 3.3.2]{intel:dsaspec}. \par
\subsubsection{Completion Signalling}
\label{subsubsec:state:completion-signal}
@ -112,11 +110,12 @@ Ordering of operations is only guaranteed for a configuration with one \gls{dsa:
Since the Linux Kernel version 5.10, a driver for the \gls{dsa} has been available, which currently lacks a counterpart on Windows Operating Systems \cite[Sec. Installation]{intel:dmldoc}. As a result, accessing the \gls{dsa} is only possible under Linux. To interact with the driver and perform configuration operations, Intel provides the accel-config user-space application \cite{intel:libaccel-config-repo}. This toolset offers a command-line interface and can read configuration files to configure the device, as mentioned in Section \ref{subsection:dsa-hwarch}. The interaction is illustrated in the upper block labelled \enquote{User space} in Figure \ref{fig:dsa-software-arch}, where it communicates with the kernel driver, depicted in light green and labelled \enquote{IDXD} in Figure \ref{fig:dsa-software-arch}. Once successfully configured, each \gls{dsa:wq} is exposed as a character device through \texttt{mmap} of the associated portal \cite[Sec. 3.3]{intel:analysis}. \par
While a process could theoretically submit work to the \gls{dsa} using either the \gls{x86:movdir64b} or \gls{x86:enqcmd} instructions, providing descriptors through manual configuration, this approach can be cumbersome. Hence, \gls{intel:dml} exists to streamline this process. Despite some limitations, such as the lack of support for \gls{dsa:dwq} submission, this library offers an interface that manages the creation and submission of descriptors, as well as error handling and reporting. The high-level abstraction offered, enables compatibility measures, allowing code developed for the \gls{dsa} to also execute on machines without the required hardware \cite[Sec. High-level C++ API, Advanced usage]{intel:dmldoc}. \par
While a process could theoretically submit work to the \gls{dsa} by manually preparing descriptors and submitting them via special instructions, this approach can be cumbersome. Hence, \gls{intel:dml} exists to streamline this process. Despite some limitations, such as the lack of support for \gls{dsa:dwq} submission, this library offers an interface that manages the creation and submission of descriptors, as well as error handling and reporting. The high-level abstraction offered, enables compatibility measures, allowing code developed for the \gls{dsa} to also execute on machines without the required hardware \cite[Sec. High-level C++ API, Advanced usage]{intel:dmldoc}. \par
\section{Programming Interface for \glsentrylong{dsa}}
\label{sec:state:dml}
\ref{sec:}
As mentioned in Section \ref{subsec:state:dsa-software-view}, \gls{intel:dml} offers a high level interface for interacting with the hardware accelerator, specifically Intel \gls{dsa}. Opting for the C++ interface, we will now demonstrate its usage by example of a simple memcopy implementation for the \gls{dsa}. \par
\begin{figure}[!t]

26
thesis/own.gls

@ -54,30 +54,6 @@
description={\textsc{\glsentrylong{dsa:dwq}:} A type of Work Queue only usable by one process, and therefore with potentially lower submission overhead. See Section \ref{subsec:state:dsa-arch-comp} for more detail.}
}
\newglossaryentry{pcie-dmr}{
short={DMR},
name={DMR},
long={PCIe Deferrable Memory Write Request},
first={PCIe Deferrable Memory Write Request (DMR)},
description={\textsc{\glsentrylong{pcie-dmr}:} \todo{write pcie-dmr description}}
}
\newglossaryentry{x86:enqcmd}{
short={ENQCMD},
name={ENQCMD},
long={x86 Instruction ENQCMD},
first={x86 Instruction ENQCMD},
description={\textsc{\glsentrylong{x86:enqcmd}:} \todo{write enqcmd description}}
}
\newglossaryentry{x86:movdir64b}{
short={MOVDIR64B},
name={MOVDIR64B},
long={x86 Instruction MOVDIR64B},
first={x86 Instruction MOVDIR64B},
description={\textsc{\glsentrylong{x86:movdir64b}:} \todo{write movdir64b description}}
}
\newglossaryentry{x86:pasid}{
short={PASID},
name={PASID},
@ -146,7 +122,7 @@
name={API},
long={Application Programming Interface},
first={Application Programming Interface (API)},
description={\textsc{\glsentrylong{api}:} Public functions exposed by a library, through which programs utilizing this library can interact with it.}
description={\textsc{\glsentrylong{api}:} Definition of the interface provided by an application, enabling interaction between software components or systems.}
}
\newglossaryentry{remotemem}{

Loading…
Cancel
Save