// search
Back to feed

V-JEPA based prediction

I’m pretty enthusiatic about world-models and especially JEPA (and V-JEPA), so I started a small prediction engine, that predicts the next few frames based on the imput stream’s current frame, visualizes the latent space (and its movement) and creates a guess about motion. I’m planning to use this in a car to build up a driving world-model.

JEPA architecture diagram

The idea is that instead of just classifying images, the model should learn the dynamics: how things move, change, and evolve over time. That is what you actually need for robotics, autonomous vehicles, or any system that has to act in the real world. Before doing something, you want to ask: “what will happen if I do X?”.

// comments