This contains my bachelors thesis and associated tex files, code snippets and maybe more. Topic: Data Movement in Heterogeneous Memories with Intel Data Streaming Accelerator
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

25 lines
4.0 KiB

\chapter{Introduction}
\label{chap:intro}
% Die Einleitung schreibt man zuletzt, wenn die Arbeit im Großen und
% Ganzen schon fertig ist. (Wenn man mit der Einleitung beginnt - ein
% häufiger Fehler - braucht man viel länger und wirft sie später doch
% wieder weg). Sie hat als wesentliche Aufgabe, den Kontext für die
% unterschiedlichen Klassen von Lesern herzustellen. Man muß hier die
% Leser für sich gewinnen. Das Problem, mit dem sich die Arbeit befaßt,
% sollte am Ende wenigsten in Grundzügen klar sein und dem Leser
% interessant erscheinen. Das Kapitel schließt mit einer Übersicht über
% den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr
% als 10 Seiten liest keiner.
The proliferation of various technologies, such as \gls{nvram}, \gls{hbm}, and \gls{remotemem}, has ushered in a diverse landscape of systems characterized by varying tiers of main memory. Within these systems, the movement of data across memory classes becomes imperative to leverage the distinct properties offered by the available technologies. The responsibility for maintaining optimal data placement falls upon the CPU, resulting in a reduction of available cycles for computational tasks. To mitigate this strain, certain current-generation Intel server processors feature the \glsentryfirst{dsa}, to which certain data operations may be offloaded \cite{intel:xeonbrief}. This thesis undertakes the challenge of optimizing data locality in heterogeneous memory architectures, utilizing the \gls{dsa}. \par
The primary objectives of this thesis are twofold. Firstly, it involves a comprehensive analysis and characterization of the architecture of the Intel \gls{dsa}. Secondly, the focus extends to the application of \gls{dsa} in the domain-specific context of \glsentryfirst{qdp} to accelerate database queries \cite{dimes-prefetching}. \par
This work introduces significant contributions to the field. Notably, the design and implementation of an offloading cache represent a key highlight, providing an interface for leveraging the strengths of tiered storage with minimal integration efforts. Its design and implementation make up a large part of this work. This resulted in an architecture applicable to any use case requiring \glsentryshort{numa}-aware data movement with offloading support to the \gls{dsa}. Additionally, the thesis includes a detailed examination and analysis of the strengths and weaknesses of the \gls{dsa} through microbenchmarks. These benchmarks serve as practical guidelines, offering insights for the optimal application of \gls{dsa} in various scenarios. To our knowledge, this thesis stands as the first scientific work to extensively evaluate the \gls{dsa} in a multi-socket system, provide benchmarks for programming through the \glsentryfirst{intel:dml} and evaluate performance for data movement from \glsentryshort{dram} to \glsentryfirst{hbm}. \par
We begin the work by furnishing the reader with pertinent technical information necessary for understanding the subsequent sections of this work in Chapter \ref{chap:state}. Background is given for \gls{hbm} and \gls{qdp}, followed by a detailed account of the \gls{dsa}s architecture along with an available programming interface. Additionally, guidance on system setup and configuration is provided. Subsequently, Chapter \ref{chap:perf} analyses the strengths and weaknesses of the \gls{dsa} through microbenchmarks. Each benchmark is elaborated upon in detail, and usage guidance is drawn from the results. Chapters \ref{chap:design} and \ref{chap:implementation} elucidate the practical aspects of the work, including the development of the interface and implementation of the cache, shedding light on specific design considerations and implementation challenges. We comprehensively assess the implemented solution by providing concrete data on gains for an exemplary database query in Chapter \ref{chap:evaluation}. Finally, Chapter \ref{chap:conclusion} reflects insights gained, and presents a review of the contributions and results of the preceding chapters. \par
%%% Local Variables:
%%% TeX-master: "diplom"
%%% End: