Plaquette v2025.3 released with a new GPU-based full-state simulation backend for unprecedented scale and performance

With the release of Plaquette v2025.3, we are excited to introduce a brand-new GPU-based full-state simulation backend, powered by NVIDIA’s cuQuantum SDK. This breakthrough massively expands the capabilities of full-state simulations in Plaquette, offering unmatched performance and memory efficiency compared to traditional CPU-based simulation approaches. For example, until now, practical limitations in memory and processing power restricted CPU-based simulations to distance 3 rotated planar codes (comprising 17 qubits), even when using high-end CPUs with large amounts of memory. By leveraging NVIDIA’s cuTensorNet library, Plaquette’s new GPU backend enables the full-state simulation of distance 5 rotated planar codes (comprising a total of 49 qubits), thus unlocking full-state analyses at unprecedented scales.

Transforming large-scale quantum simulations

Full-state fidelity for hundreds of qubits
Simulating large quantum systems in their entirety is essential for accurately capturing all possible error mechanisms, including coherent errors such as over- or under-rotations that cannot be approximated by stabilizer-based techniques. Until now, practical limitations in memory and processing power restricted CPU-based simulations to around 60 qubits, even when using high-end CPUs with large amounts of memory. By leveraging NVIDIA’s cuTensorNet library, Plaquette’s new GPU backend pushes the boundary to more than 400 qubits on a single NVIDIA RTX 4000 Ada Generation GPU, thus unlocking full-state analyses at unprecedented scales.

180-fold speedup
In addition to enabling simulations of far larger systems, the GPU backend achieved a 180-fold improvement in sampling speed compared to state-of-the-art CPU-based simulations. This dramatic speedup is vital for exploring and refining quantum error correction protocols—allowing researchers to iterate quickly over different code designs, noise models, and circuit parameters.

To illustrate the performance gap, we show the difference in overall number of simulated qubits and single-shot simulation run-time for full-state simulations of repetition code with the CPU- and the GPU-based backends. Plaquette’s CPU-based simulations were benchmarked on a high-end Intel CPU with 120GB of memory, whereas the GPU-based simulations were conducted on a single NVIDIA RTX 4000 Ada Generation GPU with 20GB of memory, and their results are shown in the following Figure.

Next
Next

QC Design Pioneers GPU-Accelerated Quantum Fault-Tolerance Design