


The 8x8-bit RSFQ multiplier uses a two-level parallel carry-save reduction tree that significantly reduces the multiplier latency. Partial products are asynchronously generated and sent to the reduction stage at the internal “hardwired” rate of 80 GHz. We will discuss the microarchitecture, design, and testing of the first 8x8-bit (by modulo 256) parallel carry-save superconductor RSFQ multiplier implemented using the ISTEC 10 kA/cm^2 1.0 μm fabrication technology. Our experiments indicate that this method leads to a 47.6% savings in the JJ count in a tree with a fan-out of 1024, as well as an average of 43.3% of the JJ count for signal splitting in ISCAS85 benchmarks. Finally, we demonstrate the accomplished gains through detailed analog simulations and modeling analyses. In this paper, we ask if there is a way to reduce these overheads propose the repurposing of JJs at the cell boundaries for fan-out and establish a set of rules to discretize critical currents in a way that is conducive to this reassignment. Towards this end, we notice that a considerable fraction of hardware resources are not involved in logic operations, but rather are used for fan-out and buffering purposes. Some of these concerns come from device-level challenges and the gap between SCE and CMOS technology nodes, and others come from the way Josephson Junctions (JJs) are used. At the same time, the scalability and area utilization of superconducting systems are major concerns. Superconductor electronics (SCE) promise computer systems with orders of magnitude higher speeds and lower energy consumption than their complementary metal-oxide semiconductor (CMOS) counterpart.
