diff --git a/docs/datasheet/cpu_dual_core.adoc b/docs/datasheet/cpu_dual_core.adoc
index 1e6acec4a..a70e0ed0f 100644
--- a/docs/datasheet/cpu_dual_core.adoc
+++ b/docs/datasheet/cpu_dual_core.adoc
@@ -1,9 +1,9 @@
 :sectnums:
 === Dual-Core Configuration
 
-.Hardware Requirements
-[IMPORTANT]
-The SMP dual-core configuration requires the <<_core_local_interruptor_clint>> to be implemented.
+.Dual-Core Example
+[TIP]
+A simple dual-core example program can be found in `sw/example/demo_dual_core`.
 
 Optionally, the CPU core can be implemented as **symmetric multiprocessing (SMP) dual-core** system.
 This dual-core configuration is enabled by the `DUAL_CORE_EN` <<_processor_top_entity_generics, top generic>>.
@@ -14,10 +14,10 @@ of both core complexes are further switched into a single system bus using a rou
 
 image::smp_system.png[align=center]
 
-Both CPU cores are fully identical and use the same configuration provided by the according
-<<_processor_top_entity_generics, top generics>>. However, each core can be identified by the according
-"hart ID" that can be retrieved from the <<_mhartid>> CSR. CPU core 0 (the _primary_ core) has `mhartid = 0`
-while core 1 (the _secondary_ core) has `mhartid = 1`.
+Both CPU cores are fully identical and use the same ISA, tuning and cache configurations provided by the
+according <<_processor_top_entity_generics, top generics>>. However, each core can be identified by the
+according "hart ID" that can be retrieved from the <<_mhartid>> CSR. CPU core 0 (the _primary_ core) has
+`mhartid = 0` while core 1 (the _secondary_ core) has `mhartid = 1`.
 
 The following table summarizes the most important aspects when using the dual-core configuration.
 
@@ -41,32 +41,63 @@ while the top of stack of core 1 has to be explicitly defined by core 0 (see <<_
 cores share the same heap, `.data` and `.bss` sections.
 | **Constructors and destructors** | Constructors and destructors are executed on core 0 only.
 (see )
+| **Core communication** | See section <<_inter_core_communication_icc>>.
 | **Bootloader** | Only core 0 will boot and execute the bootloader while core 1 is held in standby.
-| **Booting** | See next section <<_dual_core_boot>>.
+| **Booting** | See section <<_dual_core_boot>>.
 |=======================
 
-.Dual-Core Example
+
+==== Inter-Core Communication (ICC)
+
+Both cores can communicate with each other via a direct point-to-point connection based on FIFO-like message
+queues. These direct communication links are faster (in terms of latency) compared to a memory-mapped or
+shared-memory communication. Additionally, communication using these links is guaranteed to be atomic.
+
+The inter-core communication (ICC) module is implemented as dedicated hardware module within each CPU core
+(VHDL file `rtl/core/neorv32_cpu_icc.vhd`). This module is automatically included if the dual-core option
+is enabled. Each core provides a 32-bit wide and 4 entries deep FIFO for sending data to the other core.
+Hence, there are two FIFOs: one for sending data from core 0 to core 1 and another one for sending data the
+opposite way.
+
+The ICC communication links are accessed via NEORV32-specific CSRs. Hence, those FIFOs are accessible only
+by the CPU core itself and cannot be accessed by the DMA or any other CPU core. In total, three CSRs are
+provided to handle communications:
+
+The <<_mxiccsr>> is used to select the core with which to communicate. In the dual-core configuration core 1
+can only select core 0 and vice versa. The core selection in this register allows access to the according
+message FIFOs via the two other CSRs. Additionally, the CSR provides status flags (TX FIFO data available;
+RX FIFO free space) related to the selected communication link.
+
+The <<_mxiccrxd>> and <<_mxicctxd>> CSRs are used for the actual data read and write operations. Writing data
+to <<<<_mxicctxd>>> will send to the message queue of the core selected by <<_mxiccsr>>. Conversely, reading
+data from <<_mxiccrxd>> will return data received from the core selected by <<_mxiccsr>>.
+
+The ICC FIFOs do not provide any interrupt capabilities. Software is expected to use the machine-software
+interrupt of the receiving core (provided by the <<_core_local_interruptor_clint>>) to inform it about
+available messages.
+
+.ICC Software API
 [TIP]
-A simple dual-core example setup / test program can be found in `sw/example/demo_dual_core`.
+The NEORV32 software framework provides API wrappers to abstract inter-core communication:
+`sw/lib/include/noevr32_smp.h`
 
 
 ==== Dual-Core Boot
 
-After reset both cores start booting. However, core 1 will always (regardless of the boot configuration) enter
-sleep mode inside the default <<_start_up_code_crt0>> that is linked with any compiled application. The primary
-core (core 0) will continue booting executing either the <<_bootloader>> or the pre-installed image in the
-internal instruction memory (depending on the <<_boot_configuration>>).
+After reset, both cores start booting. However, core 1 will - regardless of the <<_boot_configuration>> - always
+enter <<_sleep_mode>> right inside the default <<_start_up_code_crt0>> that is linked with any compiled
+application. The primary core (core 0) will continue booting, executing either the <<_bootloader>> or the
+pre-installed image from the internal instruction memory (depending on the boot configuration).
 
-To boot-up core 1 the primary core has to use a special library function provided by the NEORV32 runtime
-environment (RTE):
+To boot-up core 1, the primary core has to use a special library function provided by the NEORV32 software framework:
 
 .CPU Core 1 launch function prototype (note that this function can only be executed on core 0)
 [source,c]
 ----
-int neorv32_rte_smp_launch(void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
+int neorv32_smp_launch(int hart_id, void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
 ----
 
-When executed, core 0 will populate a configuration structure in main memory that contain the entry point
+When executed, core 0 use the <<_inter_core_communication_icc>> to send launch data that includes the entry point
 for core 1 (via `entry_point`) and the actual stack configuration (via `stack_memory` and `stack_size_bytes`).
 
 .Core 1 Stack Memory
@@ -78,5 +109,5 @@ boundary.ยด
 
 After that, the primary core triggers the _machine software interrupt_ of core 1 using the
 <<_core_local_interruptor_clint>>. Core 1 wakes up from sleep mode, consumes the configuration structure and
-finally starts executing at the provided entry point. When `neorv32_rte_smp_launch()` returns (with no error
+finally starts executing at the provided entry point. When `neorv32_smp_launch()` returns (with no error
 code) the secondary core is online and running.