Récupération multimodale


Révision datée du 6 novembre 2024 à 11:08 par Arianne (discussion | contributions) (Page créée avec « == en construction == == Définition == XXXXX Voir aussi '''génération texte-à-image''' == Français == ''' XXXXXX''' == Anglais == ''' Cross-Modal Retrieval''' ''' CMR''' ''Cross-Modal Retrieval (CMR) is a task of retrieving items across different modalities, such as image, text, video, and audio. The core challenge of CMR is the heterogeneity gap, which arises because data from different modalities have distinct representations, making direct compa... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

XXXXX

Voir aussi génération texte-à-image

Français

XXXXXX

Anglais

Cross-Modal Retrieval

CMR

Cross-Modal Retrieval (CMR) is a task of retrieving items across different modalities, such as image, text, video, and audio. The core challenge of CMR is the heterogeneity gap, which arises because data from different modalities have distinct representations, making direct comparison difficult. To address this, most CMR methods focus on learning a shared latent embedding space. In this space, concepts from different modalities are projected, allowing their similarity to be measured using a distance metric.

Source

Source : Paper with code

Source : Medium

Source : arxiv

Contributeurs: Arianne , wiki