diff --git a/benchmarks/benchmark-plan.md b/benchmarks/benchmark-plan.md
index 26cea13..4950ad1 100644
--- a/benchmarks/benchmark-plan.md
+++ b/benchmarks/benchmark-plan.md
@@ -1,12 +1,9 @@
-# peak performance
-- meassure ddr to ddr, intra-node
-- meassure ddr to hbm, intra-node
-- meassure ddr to ddr, inter-node
-- meassure ddr to hbm, inter-node
-- meassure ddr to ddr, inter-socket
-- meassure ddr to hbm, inter-socket
+# peak-perf
+- meassure ddr to ddr
+- meassure ddr to hbm
 All for 1KiB, 4KiB, 1MiB, 1GiB
 All for HW and also SW path
+All for intra-node, inter-node and inter-socket
 --> conclude how much overhead DSA engine has
 --> conclude size after which using HW makes sense
     this point is reached when submit overhead for
@@ -17,17 +14,19 @@ All for HW and also SW path
 - multi submit
 - batch submit
 All with both 1 and 4 engines per WQ
-All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
+All for 1KiB, 4KiB, 1MiB, 1GiB
+All only on DDR and intra-node
 --> conclude which work submission strategy is best for which size
 --> conclude whether multiple engines significantly improve batch perf
-# MT submit
+# mtsubmit // done
 - multiple threads submit to the same WQ
 - use 1,2,4,8,12 threads
-All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
+All using DDR and 1MiB
 All for 1 vs 4 engines
+All on DDR and intra-node
 --> conclude how bad mt submit hurts performance
 --> conclude whether multiple engines help mt submit
-# cross copy // done
+# cross-copy // done
 - compare which is faster: xcopy, copy from source node, copy from dst node
 All for both inter-node and inter-socket copy using DDR and 1MiB on 4E
 --> conclude where a copy thread should live