Community Founders Amaterasu イタチ Posted May 21, 2023 Community Founders Share Posted May 21, 2023 Ampere this week introduced its AmpereOne processors for cloud datacenters that happen to be the industry's first general-purpose CPUs with up to 132 that can be used for AI inference. The new chips consume more power than their predecessors — Ampere Altra (which will remain in Ampere's stable for at least a while) — but the company claims that despite of higher power consumption, its processors with up to 192 cores provide higher computational density than CPUs from AMD and Intel. Some of those performance claims can be controversial. 192 Custom Cloud Native Cores Ampere's AmpereOne processors features 136 – 192 cores (as opposed to 32 to 128 cores for Ampere Altra) running at up to 3.0 GHz that are based on the company's proprietary implementation of the Armv8.6+ instruction set architecture (featuring two 128-bit vector units that support FP16, BF16, INT16, and INT8 formats) that are equipped with a 2MB of 8-way set associativity L2 cache per core (up from 1MB) and are interconnected using a mech network with 64 home nodes and directory-based snoop filter. In addition to L1 and L2 caches, the SoC also has a 64MB system level cache. The new CPUs are rated for 200W – 350W depending on exact SKU, up from 40W – 180W for the Ampere Altra. The company claims that its new cores are further optimized for cloud and AI workloads and feature 'power and are efficient' instructions per clock (IPC) gains, which probably means higher IPC (compared to Arm's Neoverse N1 used for Altra) without a tangible increase in power consumption and die area. Speaking of die area, Ampere does not disclose it, but says that the AmpereOne is made on one of TSMC's 5nm-class process technology. Although Ampere does not reveal all the details about its AmpereOne core, it says that they feature a highly accurate L1 data prefetcher (reduces latency, ensures that the CPU spends less time waiting for data, and reduces system power consumption by minimizing memory accesses), refined branch misprediction recovery (the sooner the CPU can detect a branch misprediction and recover, it will reduce latency, and will waste less power), and sophisticated memory disambiguation (increases IPC, minimizes pipeline stalls, maximizes out-of-order execution, lowers latency, and improves handling of multiple read/write requests in virtualized environments). While the list of AmpereOne core architecture improvements does not seem too long on paper, these things can indeed improve performance significantly and they required a lot of research to be made (i.e., which things slowdown performance of a cloud datacenter CPU the most?) and a lot of work to implement them efficiently https://www.tomshardware.com/news/ampere-unveils-192-core-cpu Link to comment Share on other sites More sharing options...
Recommended Posts