Class Distribution Influence and Evaluation in Deep Learning – Application to Cancer Detection on Histological Images

Cancer is a fatal disease considered the second leading cause of death. Any advances in diagnosis and detection of cancer are thus crucial to save lives. The analysis of histological images -also known as Whole Slide Images (WSIs)- is considered as the gold standard in cancer diagnosis and staging. The pathologists’ manual analysis of WSIs is still the primary diagnosis process. It is time-consuming and difficult to evaluate in a reproducible manner. Computer-aided diagnosis techniques can assist pathologists in their workflow.


Machine learning techniques, specifically deep learning algorithms, such as Convolutional Neural Networks (CNNs), are widely used in various domains that involve image analysis. The success of CNN models, however, depends on several hyper-parameter settings, such as the network architecture, the data used to train the model, and the class distribution of the training data.

To the best of our knowledge, among the hyper-parameters, the class distribution of the training data is not studied yet in the literature for the WSI data, while it could be one of the most important criteria to regulate the model performance. One of the aims of this thesis is to study in-depth the impact of class distribution both at the training stage and at the test or forecasting stage.

Another aim of this thesis is related to evaluation in a broader sense. We studied ways of evaluating the results that fit more the pathologist’s goals and solve the issues of current metrics that suffer from their incapacity to distinguish models in many cases, lacking information regarding false predictions and being optimistic in the case of imbalanced data.

Learn more HERE