Nvidia’s Revolutionary AI Model: Surpassing GPT-4o Performance

4 min readNov 1, 2024

With time passing by, Nvidia has released its new AI Model quietly, and it outperforms OpenAI’s GPT-4o and other state-of-the-art models. The new AI model is called “NVLM-D-72B” and it is open-source, which is a piece of great news for the general public to utilize in their tasks for more efficient and quick work.

This NVLM-D-72B is currently free for research and testing, with future licensing charges might be possible. Currently, It’s available for use on HuggingFace.

The model is not simple but — it is a multi-modal model that easily understands charts, and graphs, solves maths, analyzes data, and understands context from a fun meme to a restaurant menu.

In this little write-up, I want to aware you guys about the Nvidi’a new model and understand the claims of better performance.

NVLM-D-72B: Nvidia’s New Open-Source AI Model

Nvidia shared an important piece of the puzzle for the developers as many of the models are helping out developers in their routine tasks. Nvidia out-break the rule on leading the industry through hardware and now this open-source AI model which is something avoided by many leaders in the tech industry.

I wanna share an important highlight as to the AI helping developers, the leading company from the MAANG. Google’s CEO Sundar Pichai said something amazing which shows the AI is there to help but not to take the place of yours.

Sundar Pichai said:
25% of all the new codes are generated via AI, which is further reviewed by the engineers. He stated that the AI’s intervention expedites software engineers to accomplish more and move faster.

Nvidia in general is helping the devs out there in the market to work smarter and they released a 72 billion-parameter AI model NVLM-D-72B.

In the October 22 paper, Nvidia said:
the family of AI models are “frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2).”

Credits to the rightful owner.

Understanding the AI Performance Claims of Nvidia

Nvidia claims that the model achieves frontier-class results and stands as best-in-class cross-attention-based multimodal LLM. The company pointed out that the NVLM-X1.0 has the advantage of faster training and inference speeds compared to its decoder-only counterpart.

“Open-access multimodal LLMs such as LLaVA-OneVision 72B and InternVL-2-Llama3-76B, show significant performance degradation on text-only tasks after multimodal training.”
Nvidia’s paper states.
“In contrast, our NVLM-1.0 models exhibit even improved text-only performance, thanks to the inclusion of high-quality text-only SFT data.”

The most significant difference when compared to its competition is its multimodal architecture.

“Our NVLM-D-1.0–72B demonstrates versatile capabilities in various multimodal tasks by jointly utilizing OCR, reasoning, localization, common sense, world knowledge, and coding ability,”
the researchers wrote.

Why is Nvidia’s NVLM-D-72B a Big Deal?

By releasing it as an open-source model, Nvidia likes to attract researchers and developers by streamlining access, making it easy to load, and providing full control of underlying model data.

The model can be used to perform basic tasks to complex multimodal prompts, and it also integrates with Nvidia microservices.

Last Words

While Nvidia’s benchmarks are amazing when it comes on top, it does as such by little; the model’s visual and multimodal abilities are its primary capabilities.

The model is made for scientists and researchers and is open-source, it turns out to be inconceivably alluring for the artificial intelligence areas. Despite being open-source, Nvidia has restricted its use to research purposes under the non-commercial license. This model would not be modified for resale.

With this new model, Nvidia has made itself a central part of the artificial intelligence game while building consumer trust as it ventures more into artificial intelligence programming and not simply equipment — an area that the organization as of now rules around the world.

Your support means everything!

If you enjoyed my story, I’d love it if you could leave a clap or a comment. It’s how Medium acknowledges our hard work these days, and your feedback would mean the world to me!

Follow me on Linkedin & Medium

Get an email whenever I publish about the latest Tech, AI, and Life.

Get an email whenever I publish about the latest Tech, AI, and Life. Unlock Tech Insights! 🚀 Join my Medium journey as…

muhtalhakhan.medium.com