Abstract background of glowing wires and particles
CXL promises to redefine the way computing systems are designed. It runs on PCIe and can expand memory on individual CPUs, but its biggest promise is to provide network-governed memory pools that can allocate some higher latency memory as required for CPUs or software-defined virtual machines. CXL-based products begin to appear on the market in 2023.
CXL is looking to recreate data centers, but the advantages of higher latency memory for use in high-performance computing (HPC) applications weren’t clear, at least until UnifabriX demonstrated the bandwidth and capacity advantages with its CXL-based smart memory node at the 2022 Super Computing Conference (SC22). ). there Just released a video View UnifabriX offerings for memory and HPC storage applications showcasing the advantages of HPC.
UnifabriX says the product is resource processing unit (RPU) based. The RPU is integrated into the CXL smart memory node, as shown below. This is a 2-unit rack-mount server with serviceable EDSFF E3 media slots. The product has capacity for up to 64TB in DDR5 / DDR4 memory and NVMe SSDs.
UnifabriX CXL smart memory node
The company says that the product is CXL 1.1 and 2.0 compliant and runs on PCIe Gen5. They also say it’s CXL 3.0 ready and supports PCIe Gen5 and CXL expansion. It also supports NVMe SSD access through CXL (SSD CXL over Memory). The product is intended for use in virtualized and virtualized environments for a wide range of applications, including HPC, AI, and databases.
As with other CXL products, the memory node provides expanded memory, but it can also provide higher performance. In particular, at the 2022 Supercomputer Conference (SC22), a memory node was used to run an HPCG performance test against the benchmark without the aid of a memory node. The results are shown below.
Comparison of HPCG with and without the UnifabriX memory node
For a traditional HPCG benchmark, as the number of CPU cores processing the benchmark increases, performance initially increases almost linearly with the number of processor cores. However, by about 50 CPU cores, performance becomes flat without any performance improvements as the number of cores increases. By the time you get 100 available cores, only 50 cores are used. This is because there is no additional memory bandwidth available.
If a memory node is added to provide additional CXL memory in addition to the memory directly connected to the CPU cores, we see that the performance scaling using the cores can continue. The memory node improves overall HPCG performance by moving low-priority data from the CPU near memory to the far CXL memory. This prevents saturation of nearby memory and allows continuous scaling of performance with additional processor cores. As shown above, the memory node improved the performance of a standard HPCG test by more than 26%.
The company worked closely with Intel on its CXL solution and Intel reports these results along with 3 othersResearch and development Party tester at their recent product briefing on Infrastructure Processing Units (IPUs) (Intel Agilex FPGA accelerators deliver TCO, performance, and flexibility improvements up to 4y Generic Intel Xeon Platforms).
In addition to providing memory capacity and bandwidth improvements, a memory node can also provide NVMe SSD access through CXL as well. The company says its plans include memory, storage, and networking through a CXL/PCIe interface, hence the name unifabriX. With the included grids, their boxes can replace top rack (TOR) solutions as well as provide access to memory and storage.
The UnifabriX memory node, which uses the company’s resource processing unit, provides a path to overcome direct DRAM bandwidth limitations in HPC applications using shared CXL memory.