Cervical cancer is the fourth most frequent cancer in women and the eighth most commonly occurring cancer overall. Despite medical and scientific advances, there is no total effective treatment for this disease, especially when diagnosed in an advanced state. For this reason, prevention and screening programs play a very important role in the fight against cervical cancer.
Cervical cancer screening follows a standard workflow that includes the following steps: HPV test, cytology test or Pap smear, colposcopy, and biopsy. Several tools have been developed to support this workflow, making it more efficient, more practical and more affordable. In this context, the CLARE project emerged with the aim of creating a novel decision support system designed for cervical cancer screening. This dissertation integrates the CLARE project and focuses on developing decision support tools for colposcopy examination, making use of Deep Learning techniques. Colposcopy is a medical exam performed by gynecologists that consists of performing a vaginal endoscopy to predict the risk of cervical cancer. In this dissertation, the decision support system performs the same prediction based on a single cervix image and patient’s clinical data.
To find the classification model that better fits the mentioned task, several methods were applied. The first step explored some segmentation options to analyze the relevance of extracting the region of interest in this problem. After concluding that segmentation adds no value for this case, several approaches were tested integrating Transfer Learning and Multitask Learning techniques. The best models were transformed to test the effect of canonical feature regularization, where CNN’s training is oriented so the neural networks learn to extract features that are usually extracted from images, in classical Machine Learning approaches. Later, clinical data was introduced in the models, turning them into multimodal algorithms. Finally, to overcome the class imbalance problem, several approaches were implemented, such as SMOTE, Cluster Centroids, SMOTEENN, over-sampling during data augmentation, and ranking algorithms.
In the end, instead of selecting the best model, four models were selected, considering two variables: availability of clinical data, and the preferred metric. When clinical data is available, the best model is a multimodal algorithm, otherwise, is selected a unimodal model. Considering the metric variable, to select the best overall model, AUC and accuracy are the preferred metrics. However, for screening problems, it is interesting to find a model that minimizes the number of false negatives, preferring sensitivity and NPV metrics. The best overall multimodal model achieved 91.57% of AUC and 88.37% of accuracy, the best overall unimodal achieved 73.86% of AUC and 84.86% of accuracy, the best screening multimodal model obtained a sensitivity of 95.42% and an NPV of 98.62%, and the best screening unimodal model achieved a sensitivity of 49.85% and an NPV of 89.20%.
Author: Francisca Morgado
Type: MSc thesis
Partners: INESC TEC - Instituto de Engenharia de Sistemas e Computadores; Faculdade de Engenharia da Universidade do Porto