Skip to content

Commit de81861

Browse files
committed
increase max-width of lists, and increase margin for the navigation bar from top
1 parent 2866cc9 commit de81861

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

docs/index.md

+15-15
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,15 @@ toc: false
88
display: flex;
99
flex-direction: column;
1010
align-items: center;
11-
font-family: var(--serif);
1211
margin: 4rem 0 8rem;
1312
text-wrap: balance;
1413
text-align: center;
1514
}
1615

16+
ul {
17+
max-width: 3000px !important;
18+
}
19+
1720
.hero h1 {
1821
margin: 2rem 0rem;
1922
max-width: none;
@@ -35,7 +38,6 @@ toc: false
3538
.hero h2 {
3639
margin: 0;
3740
max-width: 34em;
38-
font-family: var(--serif);
3941
font-size: 30px;
4042
font-style: italic;
4143
font-weight: bold;
@@ -44,7 +46,6 @@ toc: false
4446
}
4547

4648
.abstract {
47-
font-family: var(--serif);
4849
margin: 0;
4950
font-size: 20px;
5051
font-style: initial;
@@ -59,7 +60,6 @@ toc: false
5960
}
6061

6162
.logo-beyondmore {
62-
font-family: var(--serif);
6363
display: flex;
6464
gap: 5%;
6565
align-items: center;
@@ -513,11 +513,11 @@ BeyondMoore Software Ecosystem
513513
<div clas="flex flex-col justify-start">
514514
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
515515
<img width="32" src="./assets/git.webp" />
516-
<h3><a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-serif visited:text-teal-700">CPU-Free Execution Model</a><h3>
516+
<h3><a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-sans visited:text-teal-700">CPU-Free Execution Model</a><h3>
517517
</div>
518518
<p class="text-lg">This project introduces a fully autonomous execution model for multi-GPU applications, eliminating CPU involvement beyond initial kernel launch. In conventional setups, the CPU orchestrates execution, causing overhead. We propose delegating this control flow entirely to devices, leveraging techniques like persistent kernels and device-initiated communication. Our CPU-free model significantly reduces communication overhead. Demonstrations on 2D/3D Jacobi stencil and Conjugate Gradient solvers show up to a 58.8% improvement in communication latency and a 1.63x speedup for CG on 8 NVIDIA A100 GPUs compared to CPU-controlled baselines.</p>
519519
<p>
520-
<a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-serif visited:text-teal-700">More details and git repository of the project.</a>
520+
<a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-sans visited:text-teal-700">More details and git repository of the project.</a>
521521
</p>
522522
</div>
523523
<div class="grid h-[100%] justify-center place-items-center">
@@ -529,11 +529,11 @@ BeyondMoore Software Ecosystem
529529
<div clas="flex flex-col justify-start">
530530
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
531531
<img width="32" src="./assets/git.webp" />
532-
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-serif visited:text-teal-700">Snoopie: A Multi-GPU Communication Profiler and Visualizer</a>
532+
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-sans visited:text-teal-700">Snoopie: A Multi-GPU Communication Profiler and Visualizer</a>
533533
</div>
534534
<p class="text-lg">With data movement posing a significant bottleneck in computing, profiling tools are essential for scaling multi-GPU applications efficiently. However, existing tools focus primarily on single GPU compute operations and lack support for monitoring GPU-GPU transfers and communication library calls. Addressing these gaps, we present Snoopie, an instrumentation-based multi-GPU communication profiling tool. Snoopie accurately tracks peer-to-peer transfers and GPU-centric communication library calls, attributing data movement to specific source code lines and objects. It offers various visualization modes, from system-wide overviews to detailed instructions and addresses, enhancing programmer productivity.</p>
535535
<p>
536-
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-serif visited:text-teal-700">More details and git repository of the project.</a>
536+
<a href="https://github.com/ParCoreLab/snoopie" class="text-xl font-semibold font-sans visited:text-teal-700">More details and git repository of the project.</a>
537537
</p>
538538
</div>
539539
<div class="grid h-[100%] justify-center place-items-center">
@@ -545,11 +545,11 @@ BeyondMoore Software Ecosystem
545545
<div clas="flex flex-col justify-start">
546546
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
547547
<img width="32" src="./assets/git.webp" />
548-
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-serif visited:text-teal-700">GPU to CPU Callbacks</a>
548+
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-sans visited:text-teal-700">GPU to CPU Callbacks</a>
549549
</div>
550550
<p class="text-lg">To address resource underutilization in multi-GPU systems, particularly in irregular applications, we propose a GPU-sided resource allocation method. This method dynamically adjusts the number of GPUs in use based on workload changes, utilizing GPU-to-CPU callbacks to request additional devices during kernel execution. We implemented and tested multiple callback methods, measuring their overheads on Nvidia and AMD platforms. Demonstrating the approach in an irregular application like Breadth-First Search (BFS), we achieved a 15.7% reduction in time to solution on average, with callback overheads as low as 6.50 microseconds on AMD and 4.83 microseconds on Nvidia. Additionally, the model can reduce total device usage by up to 35%, improving energy efficiency.</p>
551551
<p>
552-
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-serif visited:text-teal-700">More details and git repository of the project.</a>
552+
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-sans visited:text-teal-700">More details and git repository of the project.</a>
553553
</p>
554554
</div>
555555
<div class="grid h-[100%] justify-center place-items-center">
@@ -560,7 +560,7 @@ BeyondMoore Software Ecosystem
560560
<div clas="flex flex-col justify-start">
561561
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
562562
<img width="32" src="./assets/git.webp" />
563-
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-serif visited:text-teal-700">Unified Communication Library</a>
563+
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-sans visited:text-teal-700">Unified Communication Library</a>
564564
</div>
565565
<p class="text-lg">We're undertaking the design of an API for a unified communication library to streamline device-to-device communication within the CPU-free model by aiming to optimize communication efficiency across diverse devices. We are also investigating how the available communication libraries for a system perform under different
566566
message sizes and communication patterns. Thus, we ex-
@@ -577,7 +577,7 @@ single-process, multi-threaded, and multi-process codes. More details about the
577577
<div clas="flex flex-col justify-start">
578578
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
579579
<img width="32" src="./assets/git.webp" />
580-
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-serif visited:text-teal-700">CPU Free Model Compiler</a>
580+
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-sans visited:text-teal-700">CPU Free Model Compiler</a>
581581
</div>
582582
<p class="text-lg">We're actively crafting a compiler to empower developers to write high-level Python code that compiles into efficient CPU-free device code. This compiler integrates GPU-initiated communication libraries, NVSHMEM for NVIDIA and ROC_SHMEM for AMD, enabling GPU communication directly within Python code. With automatic generation of GPU-initiated communication calls and persistent kernels, we aim to streamline development workflows. Our prototype will be available soon.</p>
583583
</div>
@@ -590,7 +590,7 @@ single-process, multi-threaded, and multi-process codes. More details about the
590590
<div clas="flex flex-col justify-start">
591591
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
592592
<img width="32" src="./assets/git.webp" />
593-
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-serif visited:text-teal-700">CPU-Free Task Graph</a>
593+
<a href="https://github.com/ParCoreLab/" class="text-xl font-semibold font-sans visited:text-teal-700">CPU-Free Task Graph</a>
594594
</div>
595595
<p class="text-lg"> We've designed and implemented a lightweight runtime system tailored for CPU-free task graph
596596
execution in multi-device systems. Our runtime minimizes CPU involvement by handling task graph initialization
@@ -609,13 +609,13 @@ single-process, multi-threaded, and multi-process codes. More details about the
609609
<div clas="flex flex-col justify-start" style="width: 60%">
610610
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
611611
<img width="32" src="./assets/git.webp" />
612-
<a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">Precise Event Sampling</a>
612+
<a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-sans visited:text-teal-700">Precise Event Sampling</a>
613613
</div>
614614
<p class="text-lg">
615615
Precise event sampling, a profiling feature in commodity processors, accurately pinpoints instructions triggering hardware events. While widely utilized, support from vendors varies, impacting accuracy, stability, overhead, and functionality. Our study benchmarks Intel PEBS and AMD IBS, revealing PEBS's finer-grained accuracy and IBS's richer information but lower stability. PEBS incurs lower time overhead, while IBS suffers from accuracy issues. OS signal delivery adds significant time overhead. Both PEBS and IBS exhibit sampling bias. Our findings hold in a full-fledged profiling tool on modern Intel and AMD machines. This comparison offers valuable insights for hardware designers and profiling tool developers.
616616
</p>
617617
<p>
618-
All the artifacts and benchmarks can be found <a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">here.</a>
618+
All the artifacts and benchmarks can be found <a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-sans visited:text-teal-700">here.</a>
619619
</p>
620620
</div>
621621
<div class="grid h-[100%] justify-center place-items-center">

observablehq.config.ts

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,12 @@ tailwind.config = {
2929
</script>
3030
<style>
3131
* {
32-
font-family: Georgia, sans-serif !important;
32+
font-family: Arial, sans-serif !important;
3333
}
3434
3535
</style>
3636
37-
<div style="display: flex; align-items: center; gap: 0.5rem; height: 2.2rem; margin: -1.5rem -2rem 2rem -2rem; padding: 0.5rem 2rem; border-bottom: solid 1px var(--theme-foreground-faintest);">
37+
<div style="display: flex; align-items: center; gap: 0.5rem; height: 2.2rem; margin: 2rem -2rem 2rem -2rem; padding: 1rem 0rem; border-bottom: solid 1px var(--theme-foreground-faintest);">
3838
<a href="/">
3939
<h1>
4040
BeyondMoore

0 commit comments

Comments
 (0)