Posts by Collection

portfolio

publications

Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers

Published in Studies in Health Technology and Informatics - Medinfo 2021, 2022

Online forums play an important role in connecting people who have crossed paths with cancer. These communities create networks of mutual support that cover different cancer-related topics, containing an extensive amount of heterogeneous information that can be mined to get useful insights. This work presents a case study where users’ posts from an Italian cancer patient community have been classified combining both countbased and prediction-based representations to identify discussion topics, with the aim of improving message reviewing and filtering. We demonstrate that pairing simple bag-of-words representations based on keywords matching with pre-trained contextual embeddings significantly improves the overall quality of the predictions and allows the model to handle ambiguities and misspellings. By using non-English real-world data, we also investigated the reusability of pretrained multilingual models like BERT in lower data regimes like many local medical institutions.

Recommended citation: Buonocore et al. (2022). "Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers" Studies in Health Technology and Informatics - Medinfo 2021. https://pubmed.ncbi.nlm.nih.gov/35673086/

Evaluation of XAI on ALS 6-months mortality prediction

Published in Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, 2022

In the article we present a comparative evaluation study of three different model agnostic XAI methods, namely SHAP, LIME and AraucanaXAI. The prediction task considered consists in predicting mortality for ALS patients based on observations carried out during a period of 6-months. The different XAI approaches are compared according to four quantitative evaluation metrics consisting in identity, fidelity, separability and time to compute an explanation. Furthermore, a qualitative comparison of post-hoc generated explanations is carried out on specific scenarios where the ML model correctly predicted the outcome, vs when it predicted it incorrectly. The combination of the results of the qualitative and quantitative evaluations carried out in the experiment form the basis for a critical discussion of XAI methods properties and desiderata for healthcare applications, advocating for more inclusive and extensive XAI evaluation studies involving human experts.

Recommended citation: Buonocore et al. (2022). "Evaluation of XAI on ALS 6-months mortality prediction" Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum. https://www.researchgate.net/profile/Tommaso-Buonocore/publication/362761796_Evaluation_of_XAI_on_ALS_6-months_mortality_prediction/links/62fe1002eb7b135a0e422dfd/Evaluation-of-XAI-on-ALS-6-months-mortality-prediction.pdf

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

Published in ArXiv, 2022

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

Recommended citation: Buonocore et al. (2022). "Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models" ArXiv. https://arxiv.org/abs/2212.10422

talks

AIME Doctoral Consortium talk

Published:

Presentation of the Ph.D. thesis project and discussion with the Academic Panel, which includes a number of prominent academic researchers with substantial experience in the field of Artificial Intelligence in Medicine. The talk was focused on how to employ Transformer-based models like BERT in medical information retrieval systems.

Conference Proceeding talk on Local Explainability and Araucana XAI

Published:

Conference Talk within the iDPP (Intelligent Disease Progression Prediction) challenge. The goal of iDPP is to design and develop an evaluation infrastructure for AI algorithms, which we evaluated in terms of XAI comparing LIME, SHAP and our proposed system Araucana XAI.

teaching

Advanced Biomedical Machine Learning, A.Y. 2022/23

Master's Degree Course, University of Pavia, Department of Electric, Computer and Biomedical Engineering, 2023

This course explore a variety of advanced machine learning methods for mining clinical data. Much of this exploration is a hands-on experience, using time in class to expose the principles of each method. In this course I give lectures the Natural Language Processing module, prepare and co-supervise lab work and review assignments. Details about the module:

  • Frontal lessons: NLP fundamentals from frequency counts to word embeddings (6 hrs).
  • NLP lab: hands-on lab where students have to build a simple embedding-based NLP pipeline for binary classification (4 hrs).
  • Journal Club: students read, review and present a NLP-related paper (2 hrs).
  • Projects review: evaluation of students’ projects at the end of the course (2 hrs).