# multimodal AI
Latest news and articles about multimodal AI
Total: 6 articles found

ByteDance’s Seedream 5.0 Lite Adds Live Web Retrieval to Image Generation — A Step Toward More Up‑to‑Date, Reasoning‑Capable Multimodal AI
ByteDance’s Volcano Engine has launched Seedream 5.0 Lite, a lightweight image‑generation model that for the first time supports real‑time web retrieval and chain‑of‑thought reasoning. Available now on the JiMeng creative platform and due for API rollout later in February, the release tightens the gap between static generative systems and live, context‑aware content creation while raising new questions about provenance, copyright and content safety.

ByteDance Rolls Out Seedance 2.0 to Doubao: Short-form AI Video Goes Live in Limited Test
ByteDance has begun grey testing Seedance 2.0 in its Doubao app, allowing select users to generate short (4–15s) multimodal videos that use images, audio and text as references. The staged rollout, short-duration limits and quota system show a cautious path to embedding advanced generative video tools into ByteDance’s creator ecosystem while managing technical and policy risks.

ByteDance’s Seedance2.0 Redraws the Map for AI Video — and Puts Platforms in the Driver’s Seat
ByteDance’s Seedance2.0 is a step change in AI‑generated video, able to produce cinema‑grade short films from simple prompts and impressing senior industry figures. The model both democratises content creation by lowering technical barriers and raises clear risks around deepfakes, prompting ByteDance to impose early safeguards.

Keling AI’s 3.0 Push: A Chinese Model Suite Aiming to Automate End‑to‑End Video Production
Keling AI has launched a 3.0 series of multimodal models—Video 3.0, Video 3.0 Omni and Image 3.0—positioned as an end‑to‑end solution for image and video generation, editing and post‑production. The suite emphasises native multimodal I/O and subject consistency, offering speed and integration for creators while raising questions about compute demands, governance and misuse risks.

SenseTime Open-Sources ‘Sense Nova‑MARS,’ Betting on Agentic Multimodal AI to Drive Execution‑Capable Applications
SenseTime has open‑sourced Sense Nova‑MARS, a multimodal Agentic VLM available in 8B and 32B parameter sizes that the company says can plan actions, call tools and deeply fuse dynamic visual reasoning with image‑text search. The move democratizes access to execution‑oriented multimodal models, accelerating research and product integration while raising safety and governance questions about agentic AI.

Small, Open and Multimodal: Chinese Startup Releases 10‑Billion‑Parameter Vision‑Language Model Claiming SOTA Performance
Chinese start‑up Jieyue Xingchen open‑sourced Step3‑VL‑10B, a 10‑billion‑parameter multimodal model that the team says matches same‑scale state‑of‑the‑art performance on vision, reasoning, math and dialogue. The release highlights a push for efficient, deployable multimodal models and will prompt independent verification and community adoption.