## implemented - 1 to n engines per group - 1 to n threads running on one specific core / dsa engine - copy inside and across NUMA borders - cross-copy: 2 engines copying from their numa domain to the domain of the other - all with "packet sizes" of 1KiB, 2KiB, 4KiB, 8KiB, ..., 1GiB - all with both CPU and DSA for comparison ## missing - batch vs single submissions - effect of fence/drain