Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers
Published in Studies in Health Technology and Informatics - Medinfo 2021, 2022
Recommended citation: Buonocore et al. (2022). "Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers" Studies in Health Technology and Informatics - Medinfo 2021. https://pubmed.ncbi.nlm.nih.gov/35673086/
Online forums play an important role in connecting people who have crossed paths with cancer. These communities create networks of mutual support that cover different cancer-related topics, containing an extensive amount of heterogeneous information that can be mined to get useful insights. This work presents a case study where users’ posts from an Italian cancer patient community have been classified combining both countbased and prediction-based representations to identify discussion topics, with the aim of improving message reviewing and filtering. We demonstrate that pairing simple bag-of-words representations based on keywords matching with pre-trained contextual embeddings significantly improves the overall quality of the predictions and allows the model to handle ambiguities and misspellings. By using non-English real-world data, we also investigated the reusability of pretrained multilingual models like BERT in lower data regimes like many local medical institutions. [Download paper here](https://doi.org/10.3233/SHTI220147) Recommended citation: Buonocore et al. (2022). "Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers" Studies in Health Technology and Informatics - Medinfo 2021.