You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1.1 KiB
1.1 KiB
peak-perf
- meassure ddr to ddr
- meassure ddr to hbm All for 1KiB, 4KiB, 1MiB, 1GiB All for HW and also SW path All for intra-node, inter-node and inter-socket --> conclude how much overhead DSA engine has --> conclude size after which using HW makes sense this point is reached when submit overhead for hw execution is smaller than entire copy time for sw execution
submit // done
- single submit-and-wait
- multi submit
- batch submit All with both 1 and 4 engines per WQ All for 1KiB, 4KiB, 1MiB, 1GiB All only on DDR and intra-node --> conclude which work submission strategy is best for which size --> conclude whether multiple engines significantly improve batch perf
mtsubmit // done
- multiple threads submit to the same WQ
- use 1,2,4,8,12 threads All using DDR and 1MiB All for 1 vs 4 engines All on DDR and intra-node --> conclude how bad mt submit hurts performance --> conclude whether multiple engines help mt submit
cross-copy // done
- compare which is faster: xcopy, copy from source node, copy from dst node All for both inter-node and inter-socket copy using DDR and 1MiB on 4E --> conclude where a copy thread should live