Meta works with NVIDIA to build AI research supercomputer

January 24, 2022 – Meta Platforms gave a big thumbs up to NVIDIA, choosing the technologies for what it believes will be its most powerful research system to date.

The AI Research SuperCluster (RSC), announced , is already training new models to advance AI. Once fully deployed, Meta’s RSC is expected to be one of the largest customer installation of NVIDIA DGX A100 systems.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,” the company says in a blog.

Training AI’s models

When RSC is fully built out, later this year, Meta aims to use it to train AI models with more than a trillion parameters. That could advance fields such as natural-language processing for jobs like identifying harmful content in real time. In addition to performance at scale, Meta cited extreme reliability, security, privacy and the flexibility to handle “a wide range of AI models” as its key criteria for RSC.

Meta’s AI Research SuperCluster features hundreds of NVIDIA DGX systems linked on an NVIDIA Quantum InfiniBand network to accelerate the work of its AI research teams.

Under the hood

The new AI supercomputer currently uses 760 NVIDIA DGX A100 systems as its compute nodes. They pack a total of 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network to deliver 1,895 petaflops of TF32 performance.

Despite challenges from COVID-19, RSC took just 18 months to go from an idea on paper to a working AI supercomputer thanks in part to the NVIDIA DGX A100 technology at the foundation of Meta RSC.

Penguin Computing is our NVIDIA Partner Network delivery partner for RSC. In addition to the 760 DGX A100 systems and InfiniBand networking, Penguin provided managed services and AI-optimised infrastructure for Meta comprised of 46 petabytes of cache storage with its Altus systems. Pure Storage FlashBlade and FlashArray//C provide the highly performant and scalable all-flash storage capabilities needed to power RSC.

20x performance gains

It’s the second time Meta has picked NVIDIA technologies as the base for its research infrastructure. In 2017, Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day.

Meta’s early benchmarks showed RSC can train large NLP models 3x faster and run computer vision jobs 20x faster than the prior system.

In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC’s storage system to deliver up to an exabyte of data at 16 terabytes per second.

A scalable architecture

NVIDIA AI technologies are available to enterprises of any size. NVIDIA DGX, which includes a full stack of NVIDIA AI software, scales easily from a single system to a DGX SuperPOD running on-premises or at a colocation provider. Customers can also rent DGX systems through NVIDIA DGX Foundry.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow

RECENT ARTICLES

Samsara launches enterprise-grade Asset Tag

Posted on: June 28, 2024

Samsara has announced the enterprise-grade Asset Tag designed to meet customer demand for tracking and managing small, high-value assets. This new device uses the Samsara Network to offer increased visibility

Read more

Comtech unveils SmartAssist AI for non-emergency calls

Posted on: June 28, 2024

Comtech has announced the launch of SmartAssist, an artificial intelligence (AI)-backed solution developed to answer low-priority non-emergency calls without engaging a telecommunicator.

Read more
FEATURED IoT STORIES

What is IoT? A Beginner’s Guide

Posted on: April 5, 2023

What is IoT? IoT, or the Internet of Things, refers to the connection of everyday objects, or “things,” to the internet, allowing them to collect, transmit, and share data. This

Read more

The IoT Adoption Boom – Everything You Need to Know

Posted on: September 28, 2022

In an age when we seem to go through technology boom after technology boom, it’s hard to imagine one sticking out. However, IoT adoption, or the Internet of Things adoption,

Read more

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into

Read more