Artificial intelligence & robotics

Li Jing

Laying the foundation for the interaction of AGI in the real world.

Year Honored: 2023

Organization: OpenAI

Region: China

Hails From: China

The advent of large language models has greatly advanced the process of achieving general artificial intelligence. The next crucial issue is developing AI that can reason based on the real world. Multimodal technology is expected to elevate AI to this level by enabling it to understand and generate natural signals.

At OpenAI, Li’s focus shifted to scalable approaches for AI models that could understand the real world, particularly in generative models and multimodal learning. He is a key contributor to models like DALLE-3, Sora, and GPT-4o.

DALLE-3 is OpenAI's third-generation text-to-image model, capable of generating accurate and aesthetically pleasing images based on user prompts. It enhances human creativity and is used by millions of people daily. Sora is OpenAI's first text-to-video generation model, capable of producing high-resolution videos up to one minute long. It can generate complex scenes with multiple characters, specific actions, and detailed backgrounds, demonstrating an understanding of the physical world. Using text representations to learn visual world models has proven to be the most scalable and efficient method.

GPT-4o, as OpenAI's first native multimodal large language model, can perform real-time reasoning across audio, visual, and text modalities. It revolutionizes human-computer interaction, paving the way for general AI to interact in the real world. Li’s contributions in multimodal representation learning are crucial for this transition, enabling image and video generation models to understand text and language models to interpret visual inputs.

MIT Tehnology Review

Innovators Under 35

Language

Artificial intelligence & robotics

Li Jing

More in Artificial intelligence & robotics

Japan

Kosuke Inoue

Europe

Gerardo Portilla

China

Xiang Wang

Brazil

Alexandre Messina