# multimodal AI

Latest news and articles about multimodal AI

Total: 6 articles found

Two handheld gaming consoles on a sofa with game cartridges, creating a cozy game night mood.

ByteDance’s Seedream 5.0 Lite Adds Live Web Retrieval to Image Generation — A Step Toward More Up‑to‑Date, Reasoning‑Capable Multimodal AI

ByteDance’s Volcano Engine has launched Seedream 5.0 Lite, a lightweight image‑generation model that for the first time supports real‑time web retrieval and chain‑of‑thought reasoning. Available now on the JiMeng creative platform and due for API rollout later in February, the release tightens the gap between static generative systems and live, context‑aware content creation while raising new questions about provenance, copyright and content safety.

NeTe2026年2月13日 07:04

#ByteDance#Seedream 5.0 Lite#Volcano Engine

Acrobatic woman in corset and heels performing a flexible pose on a striped floor, showcasing strength and agility.

Technology

ByteDance Rolls Out Seedance 2.0 to Doubao: Short-form AI Video Goes Live in Limited Test

ByteDance has begun grey testing Seedance 2.0 in its Doubao app, allowing select users to generate short (4–15s) multimodal videos that use images, audio and text as references. The staged rollout, short-duration limits and quota system show a cautious path to embedding advanced generative video tools into ByteDance’s creator ecosystem while managing technical and policy risks.

NeTe2026年2月11日 14:55

#Seedance 2.0#ByteDance#Doubao

Intricate wireframe with dynamic ribbons in an abstract 3D composition.

Technology

ByteDance’s Seedance2.0 Redraws the Map for AI Video — and Puts Platforms in the Driver’s Seat

ByteDance’s Seedance2.0 is a step change in AI‑generated video, able to produce cinema‑grade short films from simple prompts and impressing senior industry figures. The model both democratises content creation by lowering technical barriers and raises clear risks around deepfakes, prompting ByteDance to impose early safeguards.

SoBiz2026年2月11日 08:44

#Seedance2.0#ByteDance#AI video generation

Scrabble tiles spelling 'DeepSeek' on a wooden surface. Perfect for AI and tech themes.

Technology

Keling AI’s 3.0 Push: A Chinese Model Suite Aiming to Automate End‑to‑End Video Production

Keling AI has launched a 3.0 series of multimodal models—Video 3.0, Video 3.0 Omni and Image 3.0—positioned as an end‑to‑end solution for image and video generation, editing and post‑production. The suite emphasises native multimodal I/O and subject consistency, offering speed and integration for creators while raising questions about compute demands, governance and misuse risks.

NeTe2026年1月31日 07:20

#Keling AI#multimodal AI#video generation

Technology

SenseTime Open-Sources ‘Sense Nova‑MARS,’ Betting on Agentic Multimodal AI to Drive Execution‑Capable Applications

SenseTime has open‑sourced Sense Nova‑MARS, a multimodal Agentic VLM available in 8B and 32B parameter sizes that the company says can plan actions, call tools and deeply fuse dynamic visual reasoning with image‑text search. The move democratizes access to execution‑oriented multimodal models, accelerating research and product integration while raising safety and governance questions about agentic AI.

NeTe2026年1月30日 04:50

#SenseTime#Sense Nova‑MARS#multimodal AI

Abstract black and white graphic featuring a multimodal model pattern with various shapes.

Technology

Small, Open and Multimodal: Chinese Startup Releases 10‑Billion‑Parameter Vision‑Language Model Claiming SOTA Performance

Chinese start‑up Jieyue Xingchen open‑sourced Step3‑VL‑10B, a 10‑billion‑parameter multimodal model that the team says matches same‑scale state‑of‑the‑art performance on vision, reasoning, math and dialogue. The release highlights a push for efficient, deployable multimodal models and will prompt independent verification and community adoption.

NeTe2026年1月20日 11:30

#Step3-VL-10B#Jieyue Xingchen#multimodal AI