« UniVideo » : différence entre les versions

Dernière version du 30 avril 2026 à 20:19

Définition

Nom propre d'un outil permettant de combiner une requête textuelle grâce à un grand modèle de langues (GML) et des images sources afin de générer un montage vidéo qui combine ces images selon la requête.

Compléments

Le montage de la vidéo utilise une architecture à double flux, et un modèle DiT multimodal (MMDiT) de génération d'image.

Français

UniVideo

Anglais

UniVideo

Sources

Source : arxiv

Source : huggingface

Source : UniVideo, GitHub.io

@@ Ligne 1 : / Ligne 1 : @@
-== EN CONSTRUCTION ==
+== Définition ==
+Nom propre d'un outil permettant de combiner une requête textuelle grâce à un '''[[grand modèle de langues (GML)]]''' et des images sources afin de '''[[génération automatique d'image|générer un montage vidéo]]''' qui combine ces images selon la requête.
-== Définition ==
+== Compléments ==
-xxxxx
+Le montage de la vidéo utilise une architecture à double flux, et un modèle '''DiT multimodal (MMDiT)''' de génération d'image.
 == Français ==
@@ Ligne 8 : / Ligne 9 : @@
 == Anglais ==
-'''xxxUniVideoxx '''
+'''UniVideo '''
+<!--Framework for understanding generation and editing in the video domain with a dual-stream design, combining a Multimodal Large Language Model (MLLM) for instruction understanding with a Multimodal DiT (MMDiT) for video generation.
- A unified framework that combines video understanding, generation, and editing capabilities within a single model. Unlike existing approaches that handle these tasks separately, UniVideo can interpret complex multimodal instructions and perform diverse video operations through a dual-stream architecture. The system demonstrates strong performance across multiple video tasks while enabling novel capabilities like visual prompt understanding and task composition.
+Multimodal DiT?-->
-UniVideo, a dual-stream framework combining a Multimodal Large Language Model and a Multimodal DiT, extends unified modeling to video generation and editing, achieving state-of-the-art performance and supporting task composition and generalization.
 ==Sources==
-[https://huggingface.co/papers/2510.08377   Sources :  huggingface]
+[https://arxiv.org/abs/2510.08377   Source : arxiv]
+[https://huggingface.co/papers/2510.08377   Source :  huggingface]
+[https://congwei1230.github.io/UniVideo/   Source : UniVideo, GitHub.io]
-[[Catégorie:vocabulary]]
+[[Catégorie:GRAND LEXIQUE FRANÇAIS]]

« UniVideo » : différence entre les versions