Skip to content

Commit fda091b

Browse files
authored
Update index.md
figure size updates
1 parent eee97d7 commit fda091b

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

docs/index.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -529,7 +529,7 @@ BeyondMoore Software Ecosystem
529529
<div clas="flex flex-col justify-start">
530530
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
531531
<img width="32" src="./assets/git.webp" />
532-
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-serif visited:text-teal-700">Snoopie</a>
532+
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-serif visited:text-teal-700">Snoopie: A Multi-GPU Communication Profiler and Visualizer</a>
533533
</div>
534534
<p class="text-lg">With data movement posing a significant bottleneck in computing, profiling tools are essential for scaling multi-GPU applications efficiently. However, existing tools focus primarily on single GPU compute operations and lack support for monitoring GPU-GPU transfers and communication library calls. Addressing these gaps, we present Snoopie, an instrumentation-based multi-GPU communication profiling tool. Snoopie accurately tracks peer-to-peer transfers and GPU-centric communication library calls, attributing data movement to specific source code lines and objects. It offers various visualization modes, from system-wide overviews to detailed instructions and addresses, enhancing programmer productivity.</p>
535535
<p>
@@ -567,7 +567,7 @@ BeyondMoore Software Ecosystem
567567
<p class="text-lg">We're undertaking the design of an API for a unified communication library to streamline device-to-device communication within the CPU-free model by aiming to optimize communication efficiency across diverse devices. More details about the project will be available soon. The related paper is under preparation.</p>
568568
</div>
569569
<div class="grid h-[100%] justify-center place-items-center">
570-
<img width="300px" src="./assets/network-topo.png" />
570+
<img width="360px" src="./assets/network-topo.png" />
571571
</div>
572572
</div>
573573

@@ -581,7 +581,7 @@ BeyondMoore Software Ecosystem
581581
<p class="text-lg">We're actively crafting a compiler to empower developers to write high-level Python code that compiles into efficient CPU-free device code. This compiler integrates GPU-initiated communication libraries, NVSHMEM for NVIDIA and ROC_SHMEM for AMD, enabling GPU communication directly within Python code. With automatic generation of GPU-initiated communication calls and persistent kernels, we aim to streamline development workflows. Our prototype will be available soon.</p>
582582
</div>
583583
<div class="grid h-[100%] justify-center place-items-center">
584-
<img width="500px" src="./assets/dace-compiler.png" />
584+
<img width="400px" src="./assets/dace-compiler.png" />
585585
</div>
586586
</div>
587587

@@ -600,7 +600,7 @@ BeyondMoore Software Ecosystem
600600
project will be available soon. The related paper is under review. </p>
601601
</div>
602602
<div class="grid h-[100%] justify-center place-items-center">
603-
<img width="500px" src="./assets/task-graph-Ilyas.png" />
603+
<img width="400px" src="./assets/task-graph-Ilyas.png" />
604604
</div>
605605
</div>
606606

@@ -612,7 +612,8 @@ BeyondMoore Software Ecosystem
612612
</div>
613613
<p class="text-lg">
614614
Precise event sampling, a profiling feature in commodity processors, accurately pinpoints instructions triggering hardware events. While widely utilized, support from vendors varies, impacting accuracy, stability, overhead, and functionality. Our study benchmarks Intel PEBS and AMD IBS, revealing PEBS's finer-grained accuracy and IBS's richer information but lower stability. PEBS incurs lower time overhead, while IBS suffers from accuracy issues. OS signal delivery adds significant time overhead. Both PEBS and IBS exhibit sampling bias. Our findings hold in a full-fledged profiling tool on modern Intel and AMD machines. This comparison offers valuable insights for hardware designers and profiling tool developers.
615-
615+
</p>
616+
<p>
616617
All the artifacts and benchmarks can be found <a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">here.</a>
617618
</p>
618619
</div>

0 commit comments

Comments
 (0)