loader image



Kalray's MPPA® DPU Processor (Data Processing Unit)

A New Class of Processor, Specialized in Intelligent Data Processing, for Infrastructure, Compute and AI Acceleration

The Coolidge™ 1 and 2 Data Processing Units (DPU) are third-generation processors based on Kalray’s Massively Parallel Processor Array (MPPA®) architecture. Kalray DPUs are natively capable of managing multiple workloads in parallel with no bottlenecks to enable smarter, more efficient, and energy-wise data-intensive applications.

Taking full advantage of Kalray’s patented MPPA® architecture, Coolidge™ is a scalable 80-core DPU processor designed for intelligent data processing. The DPU presents a compelling alternative compared to GPU, ASIC, and FPGA, contributing distinct advantages and adaptability across numerous data-heavy applications, ranging from data centers to edge, or in embedded systems.

Key Benefits of MPPA® Data Processing Unit (DPU)

  1. High-performance computing: Up to 1.5 TFLOPs (SP)/192 GFLOPs (DP)
  2. Power efficiency: As low as 30W
  3. AI acceleration: Up to 25 TFLOPs (16 bits)/50 TOPs (8 bits)
  4. High-speed I/O & interfaces: Up to 18GB/s,12M IOPS, low 30 μs latency, X2 100GbE
  5. Real-time data processing: Massive parallel processing, 80 cores, 6-issue VLIW, ultra-low latency
  6. Massive parallel multitask processing: Scalable 80-core DPU processor
  7. Fully programmable: Open standards: C/C++, Linux, RTOS, POSIX. Kalray SDK based on standard tools             & APIs for the development of new and portability of existing applications.
  8. Security/Safety: Hardware root of trust, secure boot, accelerated cryptographic functions (optional)

Data Processing Unit Use Cases


Develop Next Gen Storage and Networking Systems

Data center infrastructure chip for seamless integration onto PCIe Gen4 cards for use cases including I/O controllers, storage initiator and target controllers, high-speed network processing offload:

  • x86 offload or stand-alone “CPU-free” applications
  • Compatible with containerized, virtualized, and bare metal infrastructures
  • Fully programmable with dynamic distribution of resources across data and control & management planes

Fully programmable acceleration of high-performance protocols, services & QoS:

  • Enhanced support for NVMe-oF, RoCE/RDMA, TCP/IP, NVMe
  • Intelligent load-balancing, priority-based flow control, and stateless L1-L4parsing
  • High-speed data protection services for clustered and fully distributed applications
  • Line rate erasure coding (Reed-Solomon) per cluster
  • Line-rate encryption/decryption/hash (IPSEC, TLS, XTS, MACsec)
  • Acceleration for open RAN L1*
  • AI functionality for insightful analytics and adaptive configuration

*CV2 only


Acceleration of compute-intensive applications

Acceleration of complex workloads:

  • Innovative, patented core and co-processor enhancement for machine learning inference
  • Advanced computation for computer vision
  • Signal processing (e.g., FFT), cryptography, mathematics

Development of autonomous intelligent embedded systems:

  • Compatibility with multiple operating systems (Linux, RTOS)
  • Support of ‘freedom from interference’ for mixed criticality

Enable next-gen edge computing systems:

  • Real-time analytics for automation, prediction, and control
  • Seamless incorporation into existing systems


With MPPA® DPU Processors, the Possibilities Are Endless Allowing You to Innovate Without Borders

Powered by 80 cores, the MPPA® Data Processing Unit (DPU) is a new generation of intelligent processor with unique capabilities in terms of programmability, performance, parallel execution of multiple criticial tasks, energy efficiency, safety and security. Our breakthrough MPPA® technology is paving the way to a new data processing era.

It is the intelligent processor that gives you the power to do more, to propel fast-developing sectors such as 5G telecom networks, autonomous vehicles, healthcare equipment, industry 4.0, drones and robots, and many more.


DPU Technical Corner / Key Features


64-bit/32-bit architecture
From 600MHz to 1.2 GHz
6-issue VLIW
(CV1) 16KB instruction cache / 16KB data cache | (CV2) 32KB instruction cache / 32KB data cache
IEEE 754-2008 floating point unit (FPU)
Reciprocal square root operations in floating single precision
64-bit integer multiplication (asymmetric cryptography)
4 execution rings
256-bits per cycle load/store


Acceleration of INT8, INT16 or FP16 accuracy
(CV1) Up to 128 MAC per cycle | (CV2) Up to 256 MAC per cycle


16 application cores + 1 management/security core
(CV1) 4 MB of memory/L2 cache- 512GB/s | (CV2) 8MB of memory / L2 cache- 600GB/s low latency/high speed
Configurable cluster/chip cache coherency & deterministic modes


5 clusters (total of 80 application cores + 5 management cores)
Up to 1.5 TFLOPs (SP) / 192 GFLOPs (DP)
Up to 25 TFLOPs (16 bits) / 50 TOPs (8bits) for deep learning
56GB/s chip-to-chip communications (16 +12.5) x 2


16-lane PCIe GEN4 endpoint (EP) or root complex (RC)
Bifurcation up to 8 downstream ports in RC mode
SR-IOV up to 8 physical functions / 248 virtual functions
Support for hot pluggable
Up to 512 DMAs for multi queues / kernel bypass drivers
Direct PCIe-to-clusters and PCIe-to-DDR transfers
NVMe emulation
64-bit DDR4/LPDDR4-3200 channels with sideband/inline ECC
Up to two ranks per DDR4 Channel
2 DDR channels (up to 32GB) with channel interleaving
8×1/8×10/8×25/2×40/4×50/2×100 GbE
RDMA over Converged Ethernet (RoCE v1 and v2)
Jumbo Frame Support (9.6KB)
Support for PTP/IEEE 1588v2
Priority Flow Control (PFC), IEEE 802.1Qbb
Checksum offload Header & Payload
Line rate packet classification/smart load balancing
Hash & Round-robin based dispatch policy
Secure Boot with authentication & encryption
True Random Number Generators (TRNG)
RSA, Diffie-Hellman, DSA, ECC, EC-DSA and EC-DH acceleration
AES-XTS for storage application
MD5/SHA-1, SHA-2, SHA-3
Kazumi/Snow 3G, ZUC
SSI Controller for serial NOR Flash with optional boot
SDCARD UHS-I / eMMC 4.51 memory controller
JTAG IEEE 1149.1
16-bit Parallel Trace Interface
Mix criticality support
Lockable critical configuration
Capability to bank memory and caches for non-interference & time-predictable execution
L1 Cache coherency enabling/disabling

Get Started Now!

Want to learn more about Kalray's Data Processing Unit (DPU)?

Related Content


Software development environment for developping applications using open coding standards on Kalray's processors.


Kalray's programmable, low-power PCIe card that can be used in acceleration or standalone mode.


Fully programmable cards that bring the benefits of the MPPA® DPU (Data Processing Unit) technology to data centers for higher performance & more flexible solutions.

Press Release

ReNESENS Project: Best of Breed in DPU Processors, Virtualization and Software-defined Technologies to Enable French and European Digital Independence

Press Release

Recognition for Kalray's unique technology and its DPU (Data Processing Unit), a new type of low-power, high-performance programmable processor dedicated to high-performance computing applications.

Press Release

Kalray announces that it has signed a major contract with a world leader in the field of high technology, listed on the NASDAQ, in accordance with the negotiations announced recently.