This contains my bachelors thesis and associated tex files, code snippets and maybe more. Topic: Data Movement in Heterogeneous Memories with Intel Data Streaming Accelerator
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1.2 KiB

peak-perf

  • meassure ddr to ddr
  • meassure ddr to hbm

All for 1KiB, 4KiB, 1MiB, 1GiB

All for HW and also SW path

All for intra-node, inter-node and inter-socket

--> conclude how much overhead DSA engine has

--> conclude size after which using HW makes sense this point is reached when submit overhead for hw execution is smaller than entire copy time for sw execution

submit // done

  • single submit-and-wait
  • multi submit
  • batch submit

All with both 1 and 4 engines per WQ

All for 1KiB, 4KiB, 1MiB, 1GiB

All only on DDR and intra-node

--> conclude which work submission strategy is best for which size

--> conclude whether multiple engines significantly improve batch perf

mtsubmit // done

  • multiple threads submit to the same WQ
  • use 1,2,4,8,12 threads

All using DDR and 1MiB

All for 1 vs 4 engines

All on DDR and intra-node

--> conclude how bad mt submit hurts performance

--> conclude whether multiple engines help mt submit

cross-copy // done

  • compare which is faster: xcopy, copy from source node, copy from dst node

All for both inter-node and inter-socket copy using DDR and 1MiB on 4E

--> conclude where a copy thread should live