The rise in loosely-structured data available through text, images,
and other modalities has called for new ways of querying them.
Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and
retrieval of extensive multimedia archives have undergone massive
performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this
field remain limited in the kinds of queries they support and, in
particular, their inability to answer database-like queries. For this
reason, inspired by recent work on neural databases, we propose
a new framework, which we name Multimodal Neural Databases
(MMNDBs). MMNDBs can answer complex database-like queries
that involve reasoning over different input modalities, such as text
and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several
baselines, showing the limitations of currently available models.
The results show the potential of these new techniques to process
unstructured data coming from different modalities, paving the way
for future research in the area.
Dettaglio pubblicazione
2023, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 2619-2628
Multimodal Neural Databases (04b Atto di convegno in volume)
Trappolini G., Santilli A., Rodola E., Halevy A., Silvestri F.
ISBN: 9781450394086
Gruppo di ricerca: Theory of Deep Learning
keywords