AMD Instinct MI300 is THE Chance to Chip into NVIDIA AI Share
servethehome.com - 1 year, 7 months ago - Read On Original Website
Today, AMD is launching the AMD Instinct MI300 series. This is an entire family that is, in many ways, similar to Intel's original vision for 2025 Falcon Shores (although that is now GPU-only) and NVIDIA's Hopper series. The difference is that AMD is launching in mid-2023 and already delivering parts for an enormous supercomputer. Make no mistake, if something is going to chip into NVIDIA's market share in AI during 2023, this is one of the few solutions that have a legitimate chance.
More Context
What is AMD's latest chip that has been unveiled?
phoronix.com
Genoa-X
fool.com
MI300
cnbc.com
A.I
zdnet.com
MI300x
videocardz.com
Instinct MI300X GPU
servethehome.com
Instinct MI300
wepc.com
MI300X
crn.com
Instinct MI300X, EPYC 97X4
finance.yahoo.com
AI superchip
anandtech.com
EPYC
neowin.net
128-core EPYC 97X4 Series
insidehpc.com
4th Generation EPYC
latestly.com
EPYC 97X4
techradar.com
144-Core EPYC Bergamo
benzinga.com
Genoa
wccftech.com
Instinct MI300 APUs
pcmag.com
Instinct MI300X
seekingalpha.com
MI300 series
prnewswire.com
EPYC(tm)
We are writing this live at AMD's data center event so please excuse typos.
AMD Instinct MI300 is THE Chance to Chip into NVIDIA AI Share
Just to be clear, there are only a few AI companies that have a realistic chance to put a dent in NVIDIA's AI share in 2023. NVIDIA is facing very long lead times for its H100 and now A100 GPUs. As a result, if you want NVIDIA for AI and do not have an order in today, we would not expect to deploy before 2024.
We covered the Intel Gaudi2 AI Accelerator Super Cluster as one effort to offer an alternative. Beyond a large company like Intel, there are only a few companies that have a clear path this year to make a dent. One is Cerebras and its CS-2 and Wafer Scale Engine-2. Cerebras removes a huge amount of overhead by keeping wafers intact and using silicon interconnects versus PCB and cables. AMD has Instinct MI300.
More Context
Who are the potential competitors for AMD's chip?
finance.yahoo.com
Nvidia
anandtech.com
ado
crn.com
Amazon Web Services, Microsoft Azure, Google Cloud and others
benzinga.com
NVIDIA Corp. NVDA, Meta Platforms Inc. META, and Tesla Inc. TSLA
wccftech.com
NVIDIA
seekingalpha.com
Nvidia Corporation
servethehome.com
NVIDIA's Hopper series
fool.com
Nvidia's H100
The MI300 is a family of very large processors, and it is modular. AMD can utilize either CDNA3 GPU IP blocks or AMD Zen 4 CPU IP blocks along with high-bandwidth memory (HBM.) The key here is that AMD is thinking large scale with the MI300.
The base MI300 is a large silicon interposer that provides connectivity for up to eight HBM stacks and four sets of GPU or CPU tiles.
For a traditional GPU, the MI300 will be a GPU-only part. All four center tiles are GPU. With so much HBM onboard (192GB) AMD can simply fit more onto a single GPU than NVIDIA can. The NVIDIA H100 tops out currently at 96GB per H100 in the NVIDIA H100 NVL for High-End AI Inference.
The AMD Instinct MI300X has 128GB of HBM3, 5.2TB/s of memory bandwidth 896GB/s of Infinity Fabric bandwidth. This is a 153B transistor part.
More Context
What are the technical specifications of the chip?
anandtech.com
from 16 cores to 96 cores
neowin.net
128-core
wccftech.com
6 XCDs (Up To 228 CUs), 3 CCDs (Up To 24 Zen 4 Cores), 8 HBM3 Stacks
servethehome.com
24 Zen 4 cores, CDNA3 GPU cores, and 128GB HBM3
crn.com
higher core design and greater energy efficiency
zdnet.com
multiple GPU "chiplets" plus 192 gigabytes of HBM3 DRAM memory, and 5.2 terabytes per second of memory bandwidth
finance.yahoo.com
it can use up to 192GB of memory
videocardz.com
up to 128 cores, deliver up to 3.7x throughput performance for key cloud native workloads compared to Ampere1
prnewswire.com
unmatched core count in 1U and 2U density
AMD is going directly at NVIDIA H100 with this. Other AI vendors are focused on comparing to the A100, but AMD is going directly at the super popular NVIDIA H100.
The advantage of having a huge amount of onboard memory is that AMD needs fewer GPUs to run models in memory, and can run larger models in memory without having to go over NVLink to other GPUs or a CPU link. There is a huge opportunity in the market for running large AI inference models and with more GPU memory, larger more accurate models can be run entirely in memory without the power and hardware costs of spanning across multiple GPUs.
More Context
What are the other applications that the chip could be used for?
crn.com
AI, Cloud Expansion
neowin.net
business and cloud
videocardz.com
cloud native and technical computing
servethehome.com
NICs, storage, and even memory
anandtech.com
AI tasks
wepc.com
high-performance computing (HPC) and AI workloads
finance.yahoo.com
large language models and generative AI
latestly.com
Software Enablement for Generative AI (Artificial Intelligence)
benzinga.com
health care to 5G networks and data centers
zdnet.com
large language models
wccftech.com
various core IPs, memory interfaces, interconnects
beststocks.com
high-performance computing, graphics, and visualization technologies
seekingalpha.com
AI segment
prnewswire.com
technical computing
For a traditional CPU, the MI300 can have a CPU-only part. This gives us the familiar Zen 4 cores that we find in Genoa but with up to 96 cores lots of HBM3 memory. We are about to do our Intel Xeon Max review with up to 56 cores and 64GB of onboard HBM for some sense of scale. AMD is not talking about this part at the event today.
The AMD Instinct MI300A has 24 Zen 4 cores, CDNA3 GPU cores, and 128GB HBM3. This is the CPU that is being deployed for the El Capitan 2+ Exaflop supercomputer.
AMD can also populate these in different ratios including potentially not populating everything. At the same time, AMD may not productize all of the combinations one can imagine. Still, AMD's modular approach also means it has a platform to potentially update tiles asynchronously in the future.
AMD announced today a 1.5TB of HBM3 memory solution that takes 8x AMD Instinct MI300 OAM modules and places them onto an OCP UBB.
We have also seen 4x OAM boards with PCIe slots and MCIO connectors for directly attaching NICs, storage, and even memory.
Something that AMD is not focusing on, but that we have seen folks talk about when we have seen MI300 platforms live over the last few weeks, is CXL. AMD supports CXL Type-3 devices with its parts. There is a path to getting more memory, and that path uses CXL memory expansion modules. That is huge.
The big takeaway here is AMD just launched a massive GPU. AMD knows that a big part of the journey is through AI software. That is where NVIDIA Has a massive lead. AMD showed it is committed to working with PyTorch, Hugging Face, and more.
More Context
What are the businesses that AMD is targeting with the new chip?
crn.com
AI, Cloud
anandtech.com
AI/HPC
techradar.com
data center & AI
finance.yahoo.com
the latest stock market news and
videocardz.com
cloud native and technical computing
latestly.com
data centre
wepc.com
high-performance computing (HPC) and AI workloads
wccftech.com
CPU / GPU workloads
neowin.net
and for cloud data centers
cnbc.com
developers and server makers
benzinga.com
health care to 5G networks and data centers
zdnet.com
artificial intelligence computing
beststocks.com
high-performance computing, graphics, and visualization technologies
channelnewsasia.com
cloud computing providers and other large chip buyers
seekingalpha.com
supercomputers and traditional high-performance computing
prnewswire.com
growing cloud native environments
Make no mistake, the AMD MI300 is more than just a massive GPU. This is AMD's vision for a next generation of high-performance compute.