AMD AI Chip Unveiling - Assembly - Salesforce Research
AMD Instinct MI300 is THE Chance to Chip into NVIDIA AI Share
servethehome.com - 11 months ago - Read On Original Website
Today, AMD is launching the AMD Instinct MI300 series. This is an entire family that is, in many ways, similar to Intel's original vision for 2025 Falcon Shores (although that is now GPU-only) and NVIDIA's Hopper series. The difference is that AMD is launching in mid-2023 and already delivering parts for an enormous supercomputer. Make no mistake, if something is going to chip into NVIDIA's market share in AI during 2023, this is one of the few solutions that have a legitimate chance.
More Context
keyboard_arrow_down keyboard_arrow_right What is AMD's latest chip that has been unveiled?
videocardz.com Instinct MI300X GPU
servethehome.com Instinct MI300
wepc.com MI300X
crn.com Instinct MI300X, EPYC 97X4
finance.yahoo.com AI superchip
neowin.net 128-core EPYC 97X4 Series
insidehpc.com 4th Generation EPYC
latestly.com EPYC 97X4
techradar.com 144-Core EPYC Bergamo
wccftech.com Instinct MI300 APUs
pcmag.com Instinct MI300X
seekingalpha.com MI300 series
We are writing this live at AMD's data center event so please excuse typos.
AMD Instinct MI300 is THE Chance to Chip into NVIDIA AI Share
Just to be clear, there are only a few AI companies that have a realistic chance to put a dent in NVIDIA's AI share in 2023. NVIDIA is facing very long lead times for its H100 and now A100 GPUs. As a result, if you want NVIDIA for AI and do not have an order in today, we would not expect to deploy before 2024.
We covered the Intel Gaudi2 AI Accelerator Super Cluster as one effort to offer an alternative. Beyond a large company like Intel, there are only a few companies that have a clear path this year to make a dent. One is Cerebras and its CS-2 and Wafer Scale Engine-2. Cerebras removes a huge amount of overhead by keeping wafers intact and using silicon interconnects versus PCB and cables. AMD has Instinct MI300.
More Context
keyboard_arrow_down keyboard_arrow_right Who are the potential competitors for AMD's chip?
crn.com Amazon Web Services, Microsoft Azure, Google Cloud and others
benzinga.com NVIDIA Corp. NVDA, Meta Platforms Inc. META, and Tesla Inc. TSLA
seekingalpha.com Nvidia Corporation
servethehome.com NVIDIA's Hopper series
fool.com Nvidia's H100
The MI300 is a family of very large processors, and it is modular. AMD can utilize either CDNA3 GPU IP blocks or AMD Zen 4 CPU IP blocks along with high-bandwidth memory (HBM.) The key here is that AMD is thinking large scale with the MI300.
The base MI300 is a large silicon interposer that provides connectivity for up to eight HBM stacks and four sets of GPU or CPU tiles.
For a traditional GPU, the MI300 will be a GPU-only part. All four center tiles are GPU. With so much HBM onboard (192GB) AMD can simply fit more onto a single GPU than NVIDIA can. The NVIDIA H100 tops out currently at 96GB per H100 in the NVIDIA H100 NVL for High-End AI Inference.
The AMD Instinct MI300X has 128GB of HBM3, 5.2TB/s of memory bandwidth 896GB/s of Infinity Fabric bandwidth. This is a 153B transistor part.
More Context
keyboard_arrow_down keyboard_arrow_right What are the technical specifications of the chip?
anandtech.com from 16 cores to 96 cores
neowin.net 128-core
wccftech.com 6 XCDs (Up To 228 CUs), 3 CCDs (Up To 24 Zen 4 Cores), 8 HBM3 Stacks
servethehome.com 24 Zen 4 cores, CDNA3 GPU cores, and 128GB HBM3
crn.com higher core design and greater energy efficiency
zdnet.com multiple GPU "chiplets" plus 192 gigabytes of HBM3 DRAM memory, and 5.2 terabytes per second of memory bandwidth
finance.yahoo.com it can use up to 192GB of memory
videocardz.com up to 128 cores, deliver up to 3.7x throughput performance for key cloud native workloads compared to Ampere1
prnewswire.com unmatched core count in 1U and 2U density
AMD is going directly at NVIDIA H100 with this. Other AI vendors are focused on comparing to the A100, but AMD is going directly at the super popular NVIDIA H100.
The advantage of having a huge amount of onboard memory is that AMD needs fewer GPUs to run models in memory, and can run larger models in memory without having to go over NVLink to other GPUs or a CPU link. There is a huge opportunity in the market for running large AI inference models and with more GPU memory, larger more accurate models can be run entirely in memory without the power and hardware costs of spanning across multiple GPUs.
More Context
keyboard_arrow_down keyboard_arrow_right What are the other applications that the chip could be used for?
crn.com AI, Cloud Expansion
neowin.net business and cloud
videocardz.com cloud native and technical computing
servethehome.com NICs, storage, and even memory
wepc.com high-performance computing (HPC) and AI workloads
finance.yahoo.com large language models and generative AI
latestly.com Software Enablement for Generative AI (Artificial Intelligence)
benzinga.com health care to 5G networks and data centers
zdnet.com large language models
wccftech.com various core IPs, memory interfaces, interconnects
beststocks.com high-performance computing, graphics, and visualization technologies
prnewswire.com technical computing
For a traditional CPU, the MI300 can have a CPU-only part. This gives us the familiar Zen 4 cores that we find in Genoa but with up to 96 cores lots of HBM3 memory. We are about to do our Intel Xeon Max review with up to 56 cores and 64GB of onboard HBM for some sense of scale. AMD is not talking about this part at the event today.
The AMD Instinct MI300A has 24 Zen 4 cores, CDNA3 GPU cores, and 128GB HBM3. This is the CPU that is being deployed for the El Capitan 2+ Exaflop supercomputer.
AMD can also populate these in different ratios including potentially not populating everything. At the same time, AMD may not productize all of the combinations one can imagine. Still, AMD's modular approach also means it has a platform to potentially update tiles asynchronously in the future.
AMD announced today a 1.5TB of HBM3 memory solution that takes 8x AMD Instinct MI300 OAM modules and places them onto an OCP UBB.
We have also seen 4x OAM boards with PCIe slots and MCIO connectors for directly attaching NICs, storage, and even memory.
Something that AMD is not focusing on, but that we have seen folks talk about when we have seen MI300 platforms live over the last few weeks, is CXL. AMD supports CXL Type-3 devices with its parts. There is a path to getting more memory, and that path uses CXL memory expansion modules. That is huge.
The big takeaway here is AMD just launched a massive GPU. AMD knows that a big part of the journey is through AI software. That is where NVIDIA Has a massive lead. AMD showed it is committed to working with PyTorch, Hugging Face, and more.
More Context
keyboard_arrow_down keyboard_arrow_right What are the businesses that AMD is targeting with the new chip?
crn.com AI, Cloud
techradar.com data center & AI
finance.yahoo.com the latest stock market news and
videocardz.com cloud native and technical computing
latestly.com data centre
wepc.com high-performance computing (HPC) and AI workloads
wccftech.com CPU / GPU workloads
neowin.net and for cloud data centers
cnbc.com developers and server makers
benzinga.com health care to 5G networks and data centers
zdnet.com artificial intelligence computing
beststocks.com high-performance computing, graphics, and visualization technologies
channelnewsasia.com cloud computing providers and other large chip buyers
seekingalpha.com supercomputers and traditional high-performance computing
prnewswire.com growing cloud native environments
Make no mistake, the AMD MI300 is more than just a massive GPU. This is AMD's vision for a next generation of high-performance compute.