Thursday, 26 September 2019

Could Killing Off Motherboards Improve PC Performance?

One of the topics we’ve returned to repeatedly at ExtremeTech is the difficulty of performance scaling in both CPUs and GPUs. Semiconductor manufacturers across the world are grappling with this problem, as are CPU and GPU designers. To date, there have been no miracle cures or easy solutions to the problem, but companies have turned to a wide range of alternative approaches to help them boost baseline semiconductor performance, including new types of packaging technology. HBM, chiplets, and even technologies like Intel’s new 3D interconnect, Foveros, are all part of a broad industry effort to find new ways to connect chips rather than focusing solely on making chips smaller.

Now, a pair of researchers are arguing that it’s time to go one step further and dump the printed-circuit board altogether. In an article for IEEE Spectrum, Puneet Gupta and Subramanian S. Iyer write that it’s time to replace PCBs with silicon itself, and manufacture entire systems on a single wafer. If that argument sounds familiar, it’s because the pair are two of the authors of a paper we covered earlier this year on the merits of wafer-scale processing for GPUs using an interconnect technology they developed known as Si-IF — Silicon Interconnect Fabric.

Wafer-scale processing is the idea of using an entire silicon wafer to build one enormous part — a GPU, in this case. The research team’s work showed that this is potentially a viable approach today in terms of yield, performance, and power consumption, with better results than would be expected from building the equivalent number of separate GPUs using conventional manufacturing techniques. In the article, the pair note the many problems associated with keeping motherboards around in the first place, starting with the need to mount the actual physical chip in a package up to 20x larger than the CPU. The die of the CPU, after all, is typically much smaller than the physical substrate it’s mounted on.

Diagram: Brandon Palacio for IEEE Spectrum

The authors argue that using physical packages for chips (as we do when we mount them on a PCB) increases the distance chip-to-chip signals need to travel by a factor of 10, creating a mammoth speed and memory bottleneck. This is part of the problem with the so-called “memory wall” — RAM clocks and performance have increased far more slowly than CPU performance, in part because of the need to wire up memory at some physical distance from the CPU package. It’s also part of why HBM is able to provide such enormous memory bandwidth — moving memory closer to the CPU allows for much wider signal paths. Packaged chips are also harder to keep cool. Why do we do all this? Because PCBs require it.

But according to the two researchers, silicon interposers and similar technologies are fundamentally the wrong paths. Instead, they propose bonding processors, memory, analog, RF, and all other chiplets directly on-wafer. Instead of solder bumps, chips would use micrometer-scale copper pillars placed directly on the silicon substrate. Chip I/O ports would be directly bonded to the substrate with thermal compression in a copper-to-copper bond. Heatsinks could be placed on both sides of the Si-IF to cool products more readily, and silicon is a better conductor of heat than PCB.

Latency-Comparison

The sheer difficulty of current scaling makes its own argument for exploring ideas like this. The old Moore’s Law/Dennard Scaling mantra of “smaller, faster, cheaper” isn’t working anymore. It’s entirely possible that replacing PCBs with a better substrate could allow for substantially better performance scaling, at least for certain scenarios. Wafer-scale systems would be far too expensive to be installed in anyone’s home, but they could power the servers of the future. Companies like Microsoft, Amazon, and Google are betting billions on the idea that the next wave of high-performance computing will be cloud-driven, and giant wafer-based computers could find a happy home in industrial databases. Based on the results of the GPU testing we covered earlier this year, there seems to be merit in the idea. The graph above, from the GPU paper, shows latency, bandwidth, and energy requirements for wafer-scale integration versus conventional methods.

Performance Isn’t the Only Reason the PC Exists

It’s also important, however, to acknowledge that the PC ecosystem doesn’t just exist to improve performance. PCs are designed to be flexible and modular in order to facilitate use in a vast array of circumstances. I have an older X79-based machine with a limited number of USB 3.0 ports. Years ago, I decided to supplement this meager number with a four-port USB 3.0 card. If my GPU needs an upgrade, I upgrade it — I don’t buy an all-new system. If my motherboard fails, I can theoretically swap to a different board. A RAM failure means throwing away a stick of DDR3 and dropping in a new one, not a wholesale part replacement. That flexibility of approach is part of the reason why PCs have declined in cost and why the platform can be used for so many different tasks in the first place.

The authors do a commendable job of laying out both the strengths and shortcomings of adopting their Si-IF solution for semiconductor manufacturing and the article, while long, is absolutely worth reading. They note that companies would need to design redundant systems at the wafer level to ensure that any failures in situ were kept to an absolute minimum. But at the moment, the entire semiconductor industry is more-or-less aligned in the opposite direction from an idea like this. It might still be possible to buy a system with AMD CPUs and Nvidia GPUs, for example, but that would depend on an unprecedented collaboration between a client foundry (TSMC, Samsung, or GlobalFoundries) and two different clients.

I suspect we could see companies looking into this type of buildout in the long-term, but it’s not something I would expect to ever replace the more conventional PC ecosystem. The benefits of upgradability and flexibility are huge, as is the economy of scale these capabilities collectively create. But large data center providers might opt for this approach long-term if it yields results. Moore’s Law alone isn’t going to provide the answers companies are seeking. Getting rid of motherboards is a radical idea — but if it works and can ever be done affordably, somebody will probably take a shot at it.

Now Read:



from ExtremeTechExtremeTech https://www.extremetech.com/computing/299030-could-killing-off-motherboards-improve-pc-performance

No comments:

Post a Comment