NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP
The issue is no longer demand alone; it is whether the surrounding infrastructure is ready.
- StorageReview reported a development that could affect hyperscalers & cloud planning.
- The practical issue is whether demand can be converted into reliable capacity on schedule.
- Watch execution details, customer commitments, and any bottlenecks around power, cooling, silicon, or permitting.
StorageReview reported: In Equal ISL/OSL, Dell scales from 10.41 tok/s to 498.56 tok/s at batch size 64, while GIGABYTE slightly edges ahead at the upper end, growing from 9.76 tok/s to 509.18 tok/s. HP trails modestly behind both systems, ranging from 9.25 tok/s to 477.25 tok/s. The gap between systems remains relatively small throughout the workload, particularly at lower and mid-range concurrency levels. In Prefill Heavy, scaling improves substantially across all three systems. Dell increases from 25.91 tok/s to 1,079.19 tok/s, while GIGABYTE scales from 24.25 tok/s to 1,071.07 tok/s. HP reaches 988.82 tok/s with a batch size of 64. Dell and GIGABYTE remain nearly identical through most of the sweep, with Dell holding a slight advantage at the highest concurrency level. In Decode Heavy, throughput remains significantly lower overall, as expected for the decode-focused workload on a larger model. Dell ranges from 6.49 tok/s to 297.82 tok/s, GIGABYTE scales from 6.10 tok/s to 297.23 tok/s, and HP increases from 5.77 tok/s to 276.55 tok/s. Dell and GIGABYTE are neck and neck throughout the test, while HP consistently trails slightly behind both systems at larger batch sizes. In Equal ISL/OSL, Dell scales from 59.05 tok/s to 817.82 tok/s at batch size 64, while GIGABYTE ranges from 59.81 tok/s to 809.88 tok/s. HP trails slightly behind both systems, increasing from 56.51 tok/s to 780.21 tok/s. Perform.
The story lands in a market where demand is already assumed. The more useful question is whether the supporting layer around cloud infrastructure is flexible enough to turn that demand into available capacity. The constraint is not just chip supply. Advanced compute depends on packaging, memory, networking, power delivery, and the ability to land systems inside facilities that can actually run them at high utilization.
The pressure point is timing. The underappreciated variable is deployment readiness across networking, power, and packaging, not just chip availability.
That matters for buyers because the useful capacity is the installed, cooled, powered cluster, not the purchase order. It also matters for suppliers because component shortages can shift bargaining power quickly across the stack.
The financial question is whether this improves pricing power, secures scarce capacity, or exposes execution risk that is still being discounted, the operating question is procurement timing, facility readiness, power access, and whether adjacent constraints slow deployment, and the customer question is whether this changes build sequencing, partner dependence, or the cost of scaling clusters across regions.
This is where AI infrastructure differs from ordinary software growth. Capacity has to be financed, permitted, powered, cooled, connected, staffed, and then sold into real workloads before the economics are visible.
The practical read is that infrastructure advantage is becoming more local and more operational. Two companies can chase the same AI demand and end up with very different outcomes if one has better access to power, more credible delivery dates, or a cleaner path through procurement and permitting.
The next signal to watch is customer commitments, infrastructure readiness, and any signs that power, cooling, silicon supply, or permitting becomes the real bottleneck. The next test is whether delivery schedules, memory availability, and deployment readiness move together or start to diverge.