loader image

DPUs Decoded: Redefining Efficiency in Data Processing

Tim Lieber

15 min read

Discover the Benefits Behind Data Management’s Unsung Heroes

While not new to the market, Data Processing Units (DPU) have yet to be acknowledged for their abilities to optimize both performance and cost in the data center.


DPUs: What They Are (and What They Aren’t)

A DPU is a set of resources that can process, move, and store data more efficiently than other processors in the data center. More efficiently is the key phrase; DPUs are known to significantly improve performance metrics such as performance per watt and performance per dollar in the data center.

DPUs are specialized hardware components designed to accelerate and optimize data-intensive workloads. Unlike traditional CPUs (Central Processing Units) that handle general-purpose tasks, DPUs are tailored specifically for data-centric operations. DPUs offload and accelerate tasks such as data reduction and protection, data security, and network offloading, allowing CPUs to focus on more complex computations. DPus act as dedicated accelerators for specific data-related functions.

DPUs generally consist of external network interfaces communicating over NVMeoF/TCP and RoCE protocols to connect data stores to compute engines, internal PCIe interfaces communicating NVMe as either a Root Complex or End Point, an abundance of special cores, coprocessors, and accelerators, and the software to maximize resource optimization.

DPU’s Biggest Benefit

The biggest benefit of the emergence of DPU processors is the reduction of TCO for data centers, cloud service providers, hyperscalers, and enterprises. DPUs create savings by taking work away from other processors in a data center and performing that work more efficiently. This could be network offloading, compute offload, or data services offloading. I’m talking about lowering the cost of capital going into the data center and lowering the cost of operating a data center.

To be widely accepted into data center or enterprise architectures, a DPU processor must consume less power and cost less than other processing units, and must simplify the formulation of the data center, thus making it more reliable. For a DPU to be a DPU, it must contribute to several if not all these TCO challenges, achieving the best performance efficiency per watt and per dollar.

The Many Faces of the DPU

A DPU can be either a chip or a card, and to some degree it’s an abstraction of features which can be instantiated in many ways.

The key is that these resources must bring cost or power efficiency to the data center over their cousins: the GPU and the CPU. The cards and chips in Kalray’s current generation of products boast a unique combination of special data processors, coprocessors, hardwired accelerators, and software to meet the efficiency standards mentioned above.

Key characteristics Affecting Enterprise Adoption of DPUs

Different Types of DPU

DPUs are an abstraction of features and functions that come in a wide variety of architectures. They can be designed from the ground up with specialized cores and acceleration features. They can take on the form of a pure ASIC or ASIC + CPU cores or specialized cores + hardwired resource. They can even be in the form of an FPGA. Each of these has pros and cons.

Devices with Inflexible Data Plane

Many DPUs have combinations of processors and hardwired acceleration, so the difference between these devices boils down what are the cores, how are they connected to the hardwired resources, and the work separation between the two. They all mostly share high-speed interfaces such as PCIe buses and 100GbE ports.

You would think that a combination of cores and hardwired functions is the best of both worlds— but that isn’t necessarily true. Most DPU processors have anywhere from 8 to 64 ARM cores, which are used primarily for control plane functions. ARM cores are not significantly better at data processing than x86 processors but are lower power. In this case, you still have big, bloated cores with extra instruction sets which aren’t utilize for strictly data processing functions. What is needed here is a highly specialized, lightweight core that is power optimized to handle data analytics and data services optimally.

If the ARM cores need to process data in the data plane, there is no significant gain over x86 processing from a CPU. In some devices, data plane tasks are handled only in the hardwired logic, which creates speed but kills flexibility because performing functions outside of the initial design of the ASIC is impossible. To handle the vast differences in tasks from one data center to the next, all the resources of the DPU (cores + hardwired resources) must be used for both the data and control as needed by the circumstance.

In conclusion, most DPUs of this type have a very inflexible data plane and a relatively nominal processing control plane. If you have very specific networking, security or storage offload tasks like TLS, IPsec, Erasure Coding or compression, then these types of DPUs can work quite well. While more cost effective than CPUs, these processors save very little power and cost compared to other DPUs. But if the goal is a modernized data center that concentrates on data acceleration while bringing down cost, consider the highly flexible architecture of the Kalray DPU, which offers cost and power optimization of the control and data plane.

Devices with Little or No Processing Capabilities

This category shouldn’t be considered in the DPU category since the “P” in DPU is processor, but these types of devices need to be distinguished from actual DPUs. These devices rely heavily if not completely on hardwired functions and usually come in the form of an ASIC or FPGA. They are very efficient, in cost and power, for what they can do, and they have a good selection of network and security offload features. However, without processing capabilities they have no ability to perform data manipulation and analytics tasks nor adapt to varying data center requirements.

The Best of the Bunch: Low Power Meets High Performance

The best data processing unit overall is a programmable DPU that combines low power and high-performance data processing.

Processing for both the data plane and control plane is handled in the cores, where each core or set of cores can make the best possible decision as to when and how to use coprocessors or accelerators to augment the feature. Each core uses much lower power than either an ARM or a x86 core, thus reducing power significantly. The cores of this DPU processor are more efficient in their instruction set, thus allowing more data to be processed inline than a typical CPU core.

This type of data processing unit maximizes TCO reduction through lower power consumption and cost and provides maximum flexibility to handle fixed tasks, such as the offloads discussed above. At the same time, it can perform data manipulation tasks such as AI or inference workloads.


Next Up: DPUs, GPUs, and CPUs in the Data Center



Lead Solutions Architect, Kalray

Tim Lieber is a Lead Solution Architect with Kalray working with product management and engineering. His role is to innovate product features utilizing the Kalray MPPA DPU to solve data center challenges with solutions which improve performance and meet aggressive TCO efficiency targets. Tim has been with Kalray for approximately 4 years. He has worked in the computing and storage industry for 40+ years in innovation, leadership and architectural roles.

You also may like: