% geben (für irgendetwas müssen die Betreuer ja auch noch da
% sein).
This bachelor's thesis explores the dynamic landscape of heterogeneous memory systems, characterized by advancements in main memory technologies such as Non-Volatile RAM (NVRAM), High Bandwidth Memory (HBM), and Remote Memory. These systems necessitate strategic decisions regarding data placement to optimize performance, requiring the movement of data across different storage tiers. Consequently, the responsibility for maintaining optimal data placement falls upon the CPU, resulting in a reduction of available cycles for computational tasks. In response to this challenge, Intel has introduced the Data Streaming Accelerator (DSA), which offloads data operations, offering a potential avenue for enhancing efficiency in data-intensive applications. The primary objective of this thesis is to provide a comprehensive analysis and characterization of the architecture and performance of the DSA, along with its application to a domain-specific prefetching methodology aimed at accelerating database queries within heterogeneous memory systems.
This bachelor's thesis explores the dynamic landscape of heterogeneous memory systems, characterized by advancements in main memory technologies such as Non-Volatile RAM (NVRAM), High Bandwidth Memory (HBM), and Remote Memory. Systems equipped with more than one type of main memory necessitate strategic decisions regarding data placement to take advantage of the properties of the different storage tiers. The responsibility for maintaining optimal data placement falls upon the CPU, resulting in a reduction of available cycles for computational tasks. In response to this challenge, Intel has introduced the Data Streaming Accelerator (DSA), which offloads data operations, offering a potential avenue for enhancing efficiency in data-intensive applications. The primary objective of this thesis is to provide a comprehensive analysis and characterization of the architecture and performance of the DSA, along with its application to a domain-specific prefetching methodology aimed at accelerating database queries within heterogeneous memory systems.
% den Rest der Arbeit. Meist braucht man mindestens 4 Seiten dafür, mehr
% als 10 Seiten liest keiner.
The proliferation of various technologies, such as..., has ushered in a diverse landscape of systems characterized by varying tiers of main memory. Within these systems, the movement of data across memory classes becomes imperative to leverage the distinct properties offered by the available technologies. Traditionally tasked with managing data locality, the CPU faces an added burden in heterogeneous memory environments, thereby diminishing available processing cycles. To mitigate this strain on the CPU, certain Intel Server CPUs now feature the \glsentryfirst{dsa}\cite{intel:xeonbrief}. \par
In response to these challenges, this thesis undertakes the intricate task of optimizing data movement operations. At the core of this endeavor lies the introduction of the \gls{dsa}, which plays a pivotal role in enhancing streaming data movement operations across diverse applications. A thorough understanding of the architecture and functionality of \gls{dsa} is essential in addressing the challenges posed by this new form of \gls{numa}. \par
The proliferation of various technologies, such as Non-Volatile RAM (NVRAM), High Bandwidth Memory (HBM), and Remote Memory, has ushered in a diverse landscape of systems characterized by varying tiers of main memory. Within these systems, the movement of data across memory classes becomes imperative to leverage the distinct properties offered by the available technologies. Traditionally tasked with managing data locality, the CPU faces an added burden in heterogeneous memory environments, thereby diminishing available processing cycles. To mitigate this strain on the CPU, certain Intel Server Processors now feature the \glsentryfirst{dsa}, to which certain data operations may be offloaded \cite{intel:xeonbrief}. With it, this thesis undertakes the challenge of optimizing data locality on \glsentrylong{numa}s. \par
The primary objectives of this thesis are twofold. Firstly, it involves a comprehensive analysis and characterization of the Intel \gls{dsa} architecture. Secondly, the focus extends to the application of \gls{dsa} in the domain-specific context of \glsentryfirst{qdp} to accelerate database queries \cite{dimes-prefetching}. This thesis seeks to explore how \gls{dsa} can be strategically utilized to tackle the challenges posed by heterogeneous memory systems, offering insights into the integration of data streaming acceleration with intelligent prefetching. \par