Browse Source

format benchmark-plan.md so that it is legible in rendered version too

master
Constantin Fürst 1 year ago
parent
commit
7c751aa2ce
  1. 24
      benchmarks/benchmark-plan.md

24
benchmarks/benchmark-plan.md

@ -1,33 +1,57 @@
# peak-perf # peak-perf
- meassure ddr to ddr - meassure ddr to ddr
- meassure ddr to hbm - meassure ddr to hbm
All for 1KiB, 4KiB, 1MiB, 1GiB All for 1KiB, 4KiB, 1MiB, 1GiB
All for HW and also SW path All for HW and also SW path
All for intra-node, inter-node and inter-socket All for intra-node, inter-node and inter-socket
--> conclude how much overhead DSA engine has --> conclude how much overhead DSA engine has
--> conclude size after which using HW makes sense --> conclude size after which using HW makes sense
this point is reached when submit overhead for this point is reached when submit overhead for
hw execution is smaller than entire copy time hw execution is smaller than entire copy time
for sw execution for sw execution
# submit // done # submit // done
- single submit-and-wait - single submit-and-wait
- multi submit - multi submit
- batch submit - batch submit
All with both 1 and 4 engines per WQ All with both 1 and 4 engines per WQ
All for 1KiB, 4KiB, 1MiB, 1GiB All for 1KiB, 4KiB, 1MiB, 1GiB
All only on DDR and intra-node All only on DDR and intra-node
--> conclude which work submission strategy is best for which size --> conclude which work submission strategy is best for which size
--> conclude whether multiple engines significantly improve batch perf --> conclude whether multiple engines significantly improve batch perf
# mtsubmit // done # mtsubmit // done
- multiple threads submit to the same WQ - multiple threads submit to the same WQ
- use 1,2,4,8,12 threads - use 1,2,4,8,12 threads
All using DDR and 1MiB All using DDR and 1MiB
All for 1 vs 4 engines All for 1 vs 4 engines
All on DDR and intra-node All on DDR and intra-node
--> conclude how bad mt submit hurts performance --> conclude how bad mt submit hurts performance
--> conclude whether multiple engines help mt submit --> conclude whether multiple engines help mt submit
# cross-copy // done # cross-copy // done
- compare which is faster: xcopy, copy from source node, copy from dst node - compare which is faster: xcopy, copy from source node, copy from dst node
All for both inter-node and inter-socket copy using DDR and 1MiB on 4E All for both inter-node and inter-socket copy using DDR and 1MiB on 4E
--> conclude where a copy thread should live --> conclude where a copy thread should live
Loading…
Cancel
Save