Meta introduces V-JEPA 2, an AI world model to power robotics and autonomous systems

Written by Nagendra Tech

Published on:


It seems the AI community is gearing up for the next frontier in AI, world models. Meta, on Wednesday, June 11, unveiled its new AI model, V-JEPA 2. Dubbed as a ‘world model’, the V-JEPA 2 has the ability  to understand the physical world. The model has been designed to comprehend movements of objects and has the potential to enhance robotics and self-driving cars. 

The V-JEPA 2 is an open-source AI model that can understand and predict real-world environments in 3D. It allows AI to build an internal simulation of the real world, essentially helping it reason, plan, and act much like humans. While a traditional AI model would rely heavily on labelled data, the V-JEPA 2 is reportedly trained to identify patterns in unlabelled video clips, using these as its foundation for internal 3D reasoning. 

The world model highlights the tech giant’s increasing focus towards more intuitive and intelligent AI systems that can engage with the physical world. Reportedly, this technology can be beneficial in the domains of robotics, augmented reality, and future AI assistants. 

Story continues below this ad

“Today, we’re excited to share V-JEPA 2, the first world model trained on video that enables state-of-the-art understanding and prediction, as well as zero-shot planning and robot control in new environments. As we work toward our goal of achieving advanced machine intelligence (AMI), it will be important that we have AI systems that can learn about the world as humans do, plan how to execute unfamiliar tasks, and efficiently adapt to the ever-changing world around us,” Meta wrote in its official blog.

The latest announcement from Meta comes at a time when the company is facing stiff competition from rivals Google, Microsoft, and OpenAI. According to a recent CNBC report, Meta CEO Mark Zuckerberg has made AI a top priority for the company, which is also planning to invest $14 billion in Scale AI, a company that pioneers data labelling for AI training. 

Festive offer

When it comes to the specifications, the V-JEPA 2 is a 1.2 billion-parameter model that has been built using the Meta Joint Embedding Predictive Architecture (JEPA) model which was shared in 2022. V-JEPA is Meta’s first model trained on video that was released in 2024, with the latest V-JEPA 2 the company claims to have improved action prediction and world modelling capabilities which allows robots to interact with unfamiliar objects and environments to accomplish a task.

What are world models?

In simple words, world models are mental simulations that help us in predicting how the physical world behaves. We humans develop this intuition right from a young age, such as we know instinctively that a ball thrown in the air will fall back down. Similarly, while walking in a crowded space we avoid colliding with others. This inner sense of cause and effect helps us to act more effectively in complex situations. 

Story continues below this ad

When it comes to AI agents, they need similar capabilities to interact with the real world. Accordion to Meta, to achieve this their world models should be capable of understanding their surroundings and recognise objects, actions, and movements; they should be able to predict how things will change over time, especially in response to actions; they should plan ahead by simulating possible outcomes and choosing the best course of action. 

So to simplify, an AI world model is an internal simulation that helps a machine to understand, predict, and plan within a physical environment. Essentially, it helps the AI to anticipate how the world will change in response to actions. Now, this could enable more intelligent, goal-driven behavior in AI. 

What can V-JEPA 2 do?

The V-JEPA 2 model could likely enhance real-world machines like self-driving cars and robots. For instance, self-driving cars would need to understand their surroundings in real time to move about safely. While most AI models depend on massive amounts of labelled data or video footage, V-JEPA 2 reportedly uses something known as simplified ‘latent’ space to reason about how an object moves or interacts.

According to Meta’s Chief AI scientist, Yann LeCun, a world model is an ‘abstract digital twin of reality’ that allows AI to predict what will happen next and plan accordingly. It is a big leap towards making AI more useful in the physical world. In one of his recent presentations, LeCun stated that helping machines understand the physical world is different from teaching them language. 

Story continues below this ad

World models, which are a recent phenomenon, are gaining attention in the AI research community for bringing new dimensions other than large language models used in tools like ChatGPT and Google Gemini. 

In September 2024, noted AI researcher Fei-Fei Li raised $230 million for her startup World Labs, which focuses on building large-scale world models. On the other hand, Google DeepMind is also developing its own version of a world model named Genie which is capable of simulating 3D environments and games in real time.





Source link

Leave a Comment