Photo of Li Jing

Artificial intelligence & robotics

Li Jing

Laying the foundation for the interaction of AGI in the real world.

Year Honored
2023

Organization
OpenAI

Region
China

Hails From
China
The advent of large language models has greatly advanced the process of achieving general artificial intelligence. The next crucial issue is developing AI that can reason based on the real world. Multimodal technology is expected to elevate AI to this level by enabling it to understand and generate natural signals.

At OpenAI, Li’s focus shifted to scalable approaches for AI models that could understand the real world, particularly in generative models and multimodal learning. He is a key contributor to models like DALLE-3, Sora, and GPT-4o.

DALLE-3 is OpenAI's third-generation text-to-image model, capable of generating accurate and aesthetically pleasing images based on user prompts. It enhances human creativity and is used by millions of people daily​. Sora is OpenAI's first text-to-video generation model, capable of producing high-resolution videos up to one minute long. It can generate complex scenes with multiple characters, specific actions, and detailed backgrounds, demonstrating an understanding of the physical world. Using text representations to learn visual world models has proven to be the most scalable and efficient method.

GPT-4o, as OpenAI's first native multimodal large language model, can perform real-time reasoning across audio, visual, and text modalities. It revolutionizes human-computer interaction, paving the way for general AI to interact in the real world. Li’s contributions in multimodal representation learning are crucial for this transition, enabling image and video generation models to understand text and language models to interpret visual inputs.