This contains my bachelors thesis and associated tex files, code snippets and maybe more. Topic: Data Movement in Heterogeneous Memories with Intel Data Streaming Accelerator
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

57 lines
1.2 KiB

  1. # peak-perf
  2. - meassure ddr to ddr
  3. - meassure ddr to hbm
  4. All for 1KiB, 4KiB, 1MiB, 1GiB
  5. All for HW and also SW path
  6. All for intra-node, inter-node and inter-socket
  7. --> conclude how much overhead DSA engine has
  8. --> conclude size after which using HW makes sense
  9. this point is reached when submit overhead for
  10. hw execution is smaller than entire copy time
  11. for sw execution
  12. # submit // done
  13. - single submit-and-wait
  14. - multi submit
  15. - batch submit
  16. All with both 1 and 4 engines per WQ
  17. All for 1KiB, 4KiB, 1MiB, 1GiB
  18. All only on DDR and intra-node
  19. --> conclude which work submission strategy is best for which size
  20. --> conclude whether multiple engines significantly improve batch perf
  21. # mtsubmit // done
  22. - multiple threads submit to the same WQ
  23. - use 1,2,4,8,12 threads
  24. All using DDR and 1MiB
  25. All for 1 vs 4 engines
  26. All on DDR and intra-node
  27. --> conclude how bad mt submit hurts performance
  28. --> conclude whether multiple engines help mt submit
  29. # cross-copy // done
  30. - compare which is faster: xcopy, copy from source node, copy from dst node
  31. All for both inter-node and inter-socket copy using DDR and 1MiB on 4E
  32. --> conclude where a copy thread should live