Model's outstanding performance provides nuanced approach to video understanding
SAN FRANCISCO, Feb. 11, 2025 /PRNewswire-PRWeb/ -- TwelveLabs, the video understanding company, today announced the release of its Pegasus 1.2 multimodal foundation model, which represents a significant leap forward in industry-grade video language models. Pegasus 1.2 achieves state-of-the-art performance in long video understanding. The model can support videos that are up to one hour long with best-in-class accuracy while also maintaining low latency and competitive pricing. TwelveLabs' embeddings storage intelligently caches videos, allowing for repeated queries to the same video to be even faster and cheaper. With its latest advances, Pegasus 1.2 serves as a precision tool that delivers business value through its focused, intelligent system design?excelling exactly where production-grade video processing pipelines need it most.
"Video understanding represents one of the most complex challenges in artificial intelligence, requiring sophisticated models that can simultaneously interpret spatial details, temporal dynamics, and contextual nuances," said Aiden Lee CTO of TwelveLabs. "We are thrilled to debut Pegasus-1.2, which is designed to address the fundamental limitations of existing video language models by introducing a novel approach to spatio-temporal comprehension."
TwelveLabs' Pegasus foundation model was built to generate text descriptions about a video, "understanding" its content through analysis of both visual and audio elements. In doing so, it enables the production of summaries, highlights, titles, detailed reports and more based on prompts. This allows users to extract meaningful information from video content through text generation more efficiently than ever before.
Pegasus works in conjunction with TwelveLabs' Marengo model, a state-of-the-art multimodal embedding model, to bring human-like understanding to videos.
Pegasus-1.2 Takes Video Understanding to the Next Level
The core innovation of the new Pegasus-1.2 lies in its ability to dynamically balance computational efficiency with comprehensive understanding of videos across varying lengths and complexities. By implementing an advanced vision-encoding strategy and a sophisticated token reduction method, Pegasus-1.2 can capture fine-grained spatial and temporal features while maintaining computational efficiency. This approach enables Pegasus-1.2 to seamlessly transition between understanding short video clips and analyzing extended sequences up to one hour in length, a capability that significantly expands the practical applications of video AI.
Through rigorous testing, the model not only excels in low-level perceptual tasks but also demonstrates advanced reasoning skills across different video understanding domains. Importantly, Pegasus-1.2 achieves these capabilities with a compact architecture, challenging the prevailing assumption that superior performance necessitates exponentially larger model sizes. This positions Pegasus-1.2 as a significant advancement in the field of multimodal AI, offering a more efficient and nuanced approach to video language understanding.
To learn more about Pegasus-1.2, its architecture, training, and performance benchmarks, please see the technical blog. For more information on what makes TwelveLabs' technology so unique and advanced, please visit www.twelvelabs.io.
About TwelveLabs
TwelveLabs makes video instantly, intelligently searchable and understandable. TwelveLabs' state-of-the-art video understanding technology enables the accurate and timely discovery of valuable moments within an organization's vast sea of videos so that users can do and learn more. The company is backed by leading venture capitalists, technology companies, AI luminaries, and successful founders. It is headquartered in San Francisco, with an APAC office in Seoul. Learn more at twelvelabs.io.
Media Contact
Amber Moore, Moore Communications, 1 5039439381, [email protected], Moore Communications
SOURCE Twelve Labs
These press releases may also interest you
|