Browse Source

formulate the benchmark descriptions more concisely, in benchmark-plan.md

master
Constantin Fürst 1 year ago
parent
commit
151cadf3c7
  1. 21
      benchmarks/benchmark-plan.md

21
benchmarks/benchmark-plan.md

@ -1,12 +1,9 @@
# peak performance
- meassure ddr to ddr, intra-node
- meassure ddr to hbm, intra-node
- meassure ddr to ddr, inter-node
- meassure ddr to hbm, inter-node
- meassure ddr to ddr, inter-socket
- meassure ddr to hbm, inter-socket
# peak-perf
- meassure ddr to ddr
- meassure ddr to hbm
All for 1KiB, 4KiB, 1MiB, 1GiB
All for HW and also SW path
All for intra-node, inter-node and inter-socket
--> conclude how much overhead DSA engine has
--> conclude size after which using HW makes sense
this point is reached when submit overhead for
@ -17,17 +14,19 @@ All for HW and also SW path
- multi submit
- batch submit
All with both 1 and 4 engines per WQ
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
All for 1KiB, 4KiB, 1MiB, 1GiB
All only on DDR and intra-node
--> conclude which work submission strategy is best for which size
--> conclude whether multiple engines significantly improve batch perf
# MT submit
# mtsubmit // done
- multiple threads submit to the same WQ
- use 1,2,4,8,12 threads
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
All using DDR and 1MiB
All for 1 vs 4 engines
All on DDR and intra-node
--> conclude how bad mt submit hurts performance
--> conclude whether multiple engines help mt submit
# cross copy // done
# cross-copy // done
- compare which is faster: xcopy, copy from source node, copy from dst node
All for both inter-node and inter-socket copy using DDR and 1MiB on 4E
--> conclude where a copy thread should live

Loading…
Cancel
Save