Multimodal representation learning for tourism recommendation with two-tower architecture

Cui, Yuhang; Liang, Shengbin; Zhang, YuYing

doi:10.1371/journal.pone.0299370

Multimodal representation learning for tourism recommendation with two-tower architecture

Resource type

Authors/contributors

Cui, Yuhang (Author)
Liang, Shengbin (Author)
Zhang, YuYing (Author)

Title

Multimodal representation learning for tourism recommendation with two-tower architecture

Abstract

Personalized recommendation plays an important role in many online service fields. In the field of tourism recommendation, tourist attractions contain rich context and content information. These implicit features include not only text, but also images and videos. In order to make better use of these features, researchers usually introduce richer feature information or more efficient feature representation methods, but the unrestricted introduction of a large amount of feature information will undoubtedly reduce the performance of the recommendation system. We propose a novel heterogeneous multimodal representation learning method for tourism recommendation. The proposed model is based on two-tower architecture, in which the item tower handles multimodal latent features: Bidirectional Long Short-Term Memory (Bi-LSTM) is used to extract the text features of items, and an External Attention Transformer (EANet) is used to extract image features of items, and connect these feature vectors with item IDs to enrich the feature representation of items. In order to increase the expressiveness of the model, we introduce a deep fully connected stack layer to fuse multimodal feature vectors and capture the hidden relationship between them. The model is tested on the three different datasets, our model is better than the baseline models in NDCG and precision.

Publication

PLOS ONE

Volume

19

Issue

2

Date

2024-02-23

Language

en

DOI

10.1371/journal.pone.0299370

ISSN

1932-6203

URL

https://dspace.usj.edu.mo/handle/123456789/6564

Accessed

11/11/25, 9:02 AM

Library Catalog

dspace.usj.edu.mo

Extra

Publisher: Public Library of Science (PLoS)

Citation

Cui, Y., Liang, S., & Zhang, Y. (2024). Multimodal representation learning for tourism recommendation with two-tower architecture. PLOS ONE, 19(2). https://doi.org/10.1371/journal.pone.0299370

Academic Units

Institute for Data Engineering and Sciences
- Liang Shengbin

Link to this record

https://research.usj.edu.mo/bibliography/WFH6WXGP