bachelor-thesis/thesis/content/10_introduction.tex


								\chapter{Introduction}

								\label{chap:intro}


								% Die Einleitung schreibt man zuletzt, wenn die Arbeit im Großen und

								% Ganzen schon fertig ist. (Wenn man mit der Einleitung beginnt - ein

								% häufiger Fehler - braucht man viel länger und wirft sie später doch

								% wieder weg). Sie hat als wesentliche Aufgabe, den Kontext für die

								% unterschiedlichen Klassen von Lesern herzustellen. Man muß hier die

								% Leser für sich gewinnen. Das Problem, mit dem sich die Arbeit befaßt,

								% sollte am Ende wenigsten in Grundzügen klar sein und dem Leser

								% interessant erscheinen. Das Kapitel schließt mit einer Übersicht über

								% den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr

								% als 10 Seiten liest keiner.


								The proliferation of various technologies, such as..., has ushered in a diverse landscape of systems characterized by varying tiers of main memory. Within these systems, the movement of data across memory classes becomes imperative to leverage the distinct properties offered by the available technologies. Traditionally tasked with managing data locality, the CPU faces an added burden in heterogeneous memory environments, thereby diminishing available processing cycles. To mitigate this strain on the CPU, certain Intel Server CPUs now feature the \glsentryfirst{dsa} \cite{intel:xeonbrief}. \par


								In response to these challenges, this thesis undertakes the intricate task of optimizing data movement operations. At the core of this endeavor lies the introduction of the \gls{dsa}, which plays a pivotal role in enhancing streaming data movement operations across diverse applications. A thorough understanding of the architecture and functionality of \gls{dsa} is essential in addressing the challenges posed by this new form of \gls{numa}. \par


								The primary objectives of this thesis are twofold. Firstly, it involves a comprehensive analysis and characterization of the Intel \gls{dsa} architecture. Secondly, the focus extends to the application of \gls{dsa} in the domain-specific context of \glsentryfirst{qdp} to accelerate database queries \cite{dimes-prefetching}. This thesis seeks to explore how \gls{dsa} can be strategically utilized to tackle the challenges posed by heterogeneous memory systems, offering insights into the integration of data streaming acceleration with intelligent prefetching. \par


								This work introduces significant contributions to the field. Notably, the design and implementation of an offloading cache represent a key highlight, providing an interface for leveraging the strengths of tiered storage with minimal integration efforts. The code for this is made available in the accompanying repository \cite{thesis-repo} under 'offloading-cacher'. Additionally, the thesis includes a detailed examination and analysis of the strengths and weaknesses of the \gls{dsa} through microbenchmarks. These benchmarks serve as practical guidelines, offering insights for the optimal application of \gls{dsa} in various scenarios. As of the time of writing, this thesis stands as the first scientific work to extensively evaluate the \gls{dsa} in a multi-socket system and provide benchmarks for programming through the \glsentryfirst{dml}. Furthermore, performance for data movement from \glsentryshort{dram} to \glsentryfirst{hbm} using \gls{dsa} has not yet been evaluated by the scientific community. \par


								The Technical Background chapter furnishes the reader with pertinent background information necessary for understanding the subsequent sections of this work, encompassing \gls{hbm}, \gls{qdp}, and \gls{dsa} along with its programming interface \cite{intel:dmldoc}. Additionally, guidance on system setup and configuration is provided. Subsequently, the Performance Microbenchmarks section analyzes the strengths and weaknesses of the \gls{dsa}. Methodologies are presented, each benchmark is elaborated upon in detail, and usage guidance is drawn from the results. The following sections, Design and Implementation, elucidate the practical aspects of the work, including the development of the interface and implementation for an offloading cache, shedding light on specific design considerations and implementation challenges. The Evaluation section offers a comprehensive assessment of the implemented solution. In Conclusion, insights gained are reflected upon, and the contributions and results of the preceding chapters are reviewed. \par


								%%% Local Variables:

								%%% TeX-master: "diplom"

								%%% End: