\chapter{Introduction} \label{chap:intro} % Die Einleitung schreibt man zuletzt, wenn die Arbeit im Großen und % Ganzen schon fertig ist. (Wenn man mit der Einleitung beginnt - ein % häufiger Fehler - braucht man viel länger und wirft sie später doch % wieder weg). Sie hat als wesentliche Aufgabe, den Kontext für die % unterschiedlichen Klassen von Lesern herzustellen. Man muß hier die % Leser für sich gewinnen. Das Problem, mit dem sich die Arbeit befaßt, % sollte am Ende wenigsten in Grundzügen klar sein und dem Leser % interessant erscheinen. Das Kapitel schließt mit einer Übersicht über % den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr % als 10 Seiten liest keiner. The proliferation of various technologies, such as Non-Volatile RAM (NVRAM), High Bandwidth Memory (HBM), and Remote Memory, has ushered in a diverse landscape of systems characterized by varying tiers of main memory. Within these systems, the movement of data across memory classes becomes imperative to leverage the distinct properties offered by the available technologies. Traditionally tasked with managing data locality, the CPU faces an added burden in heterogeneous memory environments, thereby diminishing available processing cycles. To mitigate this strain on the CPU, certain Intel Server Processors now feature the \glsentryfirst{dsa}, to which certain data operations may be offloaded \cite{intel:xeonbrief}. With it, this thesis undertakes the challenge of optimizing data locality on \glsentrylong{numa}s. \par The primary objectives of this thesis are twofold. Firstly, it involves a comprehensive analysis and characterization of the architecture of the Intel \gls{dsa}. Secondly, the focus extends to the application of \gls{dsa} in the domain-specific context of \glsentryfirst{qdp} to accelerate database queries \cite{dimes-prefetching}. \par This work introduces significant contributions to the field. Notably, the design and implementation of an offloading cache represent a key highlight, providing an interface for leveraging the strengths of tiered storage with minimal integration efforts. Additionally, the thesis includes a detailed examination and analysis of the strengths and weaknesses of the \gls{dsa} through microbenchmarks. These benchmarks serve as practical guidelines, offering insights for the optimal application of \gls{dsa} in various scenarios. As of the time of writing, this thesis stands as the first scientific work to extensively evaluate the \gls{dsa} in a multi-socket system and provide benchmarks for programming through the \glsentryfirst{intel:dml}. Furthermore, performance for data movement from \glsentryshort{dram} to \glsentryfirst{hbm} using \gls{dsa} has not yet been evaluated by the scientific community. \par The Technical Background chapter furnishes the reader with pertinent background information necessary for understanding the subsequent sections of this work, encompassing \gls{hbm}, \gls{qdp}, and \gls{dsa} along with its programming interface \cite{intel:dmldoc}. Additionally, guidance on system setup and configuration is provided. Subsequently, the Performance Microbenchmarks section analyzes the strengths and weaknesses of the \gls{dsa}. Methodologies are presented, each benchmark is elaborated upon in detail, and usage guidance is drawn from the results. The following sections, Design and Implementation, elucidate the practical aspects of the work, including the development of the interface and implementation for an offloading cache, shedding light on specific design considerations and implementation challenges. The Evaluation section offers a comprehensive assessment of the implemented solution. In Conclusion, insights gained are reflected upon, and the contributions and results of the preceding chapters are reviewed. \par %%% Local Variables: %%% TeX-master: "diplom" %%% End: