« OmniVideoBench » : différence entre les versions

Version du 23 février 2026 à 15:07

xxxxx

OmniVideoBench

OmniVideoBench

@@ Ligne 10 : / Ligne 10 : @@
 '''OmniVideoBench'''
- A comprehensive benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason across both audio and visual information in videos. The benchmark addresses a critical gap in current evaluation methods, which often focus on single modalities or fail to properly integrate audio-visual reasoning in a logically consistent manner.
+<!--Comprehensive benchmark for evaluating deep audio-visual reasoning across a wide variety of tasks and modalities in multimodal large language model.-->
-OmniVideoBench is a comprehensive benchmark for evaluating audio-visual reasoning in multimodal large language models, addressing modality complementarity and logical consistency.
+==Sources==
+[https://github.com/NJU-LINK/OmniVideoBench   Source : GitHub]
-==Sources==
+[https://huggingface.co/papers/2510.10689 Source :  huggingface]
-[https://huggingface.co/papers/2510.10689 Sources :  huggingface]
+[https://omnivideobench.github.io/omnivideobench_home/   Source : OmniVideoBench]
 [[Catégorie:vocabulary]]