Patient trial matching using pseudo-siamese network

Abstract

Background: Clinical trials suffer from insufficient patient (pt) recruitment. The availability of electronic health records (EHR) and trial eligibility criteria (EC) is promising for data driven pt-trial matching. The objective is to find qualified pt given patients' EHR and trial EC in unstructured text EC. Pseudo-Siamese network is a novel subfield within information retrieval and has shown great success in the cross-modal information retrieval problems such as semantic image-text retrieval (e.g., match images with text descriptions). The objective is to find the match between pts and clinical trials using Pseudo-Siamese network based cross-modal retrieval. Our model addresses the following challenges: (1) How to match unstructured EC text with structured EHR where EC often encode more general disease concepts and EHR represent pt conditions using more specific medical codes. (2) How to capture pts' evolving health conditions. (3) How to explicitly handle the difference for inclusion and exclusion criteria. Methods: Our matching model addresses these challenges as follows: (1) we augment the medical codes in pts’ records with their textual descriptions and hierarchical taxonomies, such that concepts can be embedded in finer and more coarse levels for better concept alignment across pt data and ECs. (2) We include an attentive dynamic memory network that extracts the best matching and more recent pt EHR to match with ECs. (3) We introduce a composite loss term to maximize the similarity between pt records and inclusion criteria while minimizes the similarity between pt records and exclusion criteria. Results: We evaluated our model on a pt-trial match dataset on the ECs collected from 590 clinical trials from ClinicalTrials.gov. We also extract 83,371 pt claims data from IQVIA database collected (2002-2018), where each pt is eligible for at least one trial. We compared our model with leading pt-trial matching models. Our model significantly outperforms the best baseline model by 24.3% relatively higher accuracy score. We also tested these models in 34 oncology trials in 25 cancers. Results will be reported. Conclusions: Pseudo-Siamese network successfully solved the cross-modal information retrieval problems. We therefore propose a new pt-trial matching model based on Pseudo-Siamese network model. Experiments on real-world datasets demonstrated that our model significantly outperforms existing works in pt-trial matching for oncology trials.

Publication
Journal of Clinical Oncology