Explaining single predictions : a faster method

Machine learning has proven increasingly essential in manyfields. Yet, a lot obstacles still hinder its use by non-experts. The lack oftrust in the results obtained is foremost among them, and has inspiredseveral explanatory approaches in the literature. In this paper, we areinvestigating the domain of single prediction explanation. This is per-formed by providing the user a detailed explanation of the

Read More

KD-means : clustering method for massive data based on kd-tree

K-means clustering is a popular unsupervised classification algorithm employed in several domains, e.g., imaging, segmentation, or compression. Nevertheless, the number of clusters k, fixed apriori, affects mainly the clustering quality. Current State-of-the-art k-means implementations could automatically set of the number of clusters. However, they result in unreasonable processing time while classifying large volumes of data. In this paper, we propose

Read More

Improving on coalitional prediction explanation

Machine learning has proven increasingly essential in many fields but a lot obstacles still hinder its use by non-experts. The lack of trust in the results obtained is foremost among them, and has inspired several explanatory approaches in the literature. These approaches provide a great insight on the predictions of a model, but at a cost of a long computation

Read More

Automatic Classification Rules for Anomaly Detection in Time-series

Anomaly detection in time-series is an important issue in many applications. It is particularly hard to accurately detect multiple anomalies in time-series. Pattern discovery and rule extraction are effective solutions for allowing multiple anomaly detection. In this paper, we define a Composition-based Decision Tree algorithm that automatically discovers and generates human-understandable classification rules for multiple anomaly detection in time-series. To

Read More

Getting Insights from a large Corpus of Scientific Papers on Specialised Comprehensive Topics – the Case of COVID-19

COVID-19 is one of the most important topics these days, specifically on search engines and news. While fake news is easily shared, scientific papers are reliable sources where information can be extracted. With about 24,000 scientific publications on COVID-19 and related research on PubMed, automatic computer-assisted analysis is required. In this paper, we develop two methodologies to get insights on

Read More

A Zone-Based Data Lake Architecture for IoT, small and Big Data

Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an

Read More

The impact of Imabalanced training Data on Local matching learning of ontogolie

Matching learning corresponds to the combination of ontology matching and machine learning techniques. This strategy has gained increasing attention in recent years. However, state-of-the-art approaches implementing matching learning strategies are not well-tailored to deal with imbalanced training sets. In this paper, we address the problem of the imbalanced training sets and their impacts on the performance of the matching learning

Read More