Negative Filtering of CCTV Content-Forensic Video Analysis Framework

This paper presents our work on forensic video analysis that aimed to assist videosurveillance operators by reducing the volume of video to analyze during the search for post-evidence in videos. This work is conducted in collaboration with the French National Police and is based on requirements defined in a project related to videos analysis in the context of investigations. Due

Explaining single predictions : a faster method

Machine learning has proven increasingly essential in manyfields. Yet, a lot obstacles still hinder its use by non-experts. The lack oftrust in the results obtained is foremost among them, and has inspiredseveral explanatory approaches in the literature. In this paper, we areinvestigating the domain of single prediction explanation. This is per-formed by providing the user a detailed explanation of the

KD-means : clustering method for massive data based on kd-tree

K-means clustering is a popular unsupervised classification algorithm employed in several domains, e.g., imaging, segmentation, or compression. Nevertheless, the number of clusters k, fixed apriori, affects mainly the clustering quality. Current State-of-the-art k-means implementations could automatically set of the number of clusters. However, they result in unreasonable processing time while classifying large volumes of data. In this paper, we propose

Improving on coalitional prediction explanation

Machine learning has proven increasingly essential in many fields but a lot obstacles still hinder its use by non-experts. The lack of trust in the results obtained is foremost among them, and has inspired several explanatory approaches in the literature. These approaches provide a great insight on the predictions of a model, but at a cost of a long computation

Automatic Classification Rules for Anomaly Detection in Time-series

Anomaly detection in time-series is an important issue in many applications. It is particularly hard to accurately detect multiple anomalies in time-series. Pattern discovery and rule extraction are effective solutions for allowing multiple anomaly detection. In this paper, we define a Composition-based Decision Tree algorithm that automatically discovers and generates human-understandable classification rules for multiple anomaly detection in time-series. To

Getting Insights from a large Corpus of Scientific Papers on Specialised Comprehensive Topics – the Case of COVID-19

COVID-19 is one of the most important topics these days, specifically on search engines and news. While fake news is easily shared, scientific papers are reliable sources where information can be extracted. With about 24,000 scientific publications on COVID-19 and related research on PubMed, automatic computer-assisted analysis is required. In this paper, we develop two methodologies to get insights on

A Zone-Based Data Lake Architecture for IoT, small and Big Data

Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an

Metadata Management for Data Lakes

To prevent data lakes from being invisible and inaccessible to users, an efficient metadata management system is necessary. In this paper, we propose a such system based on a generic and extensible classification of metadata. A metadata conceptual schema which considers different types (structured, semi-structured and unstructured) of raw or processed data is presented. This schema is implemented in two

Data Lakes : Trends and Perspectives

As a relatively new concept, data lake has neither a standard definition nor an acknowledged architecture. Thus, we study the existing work and propose a complete definition and a generic and extensible architecture of data lake. What’s more, we introduce three future research axes in connection with our health-care Information Technology (IT) activities. They are related to (i) metadata management

The impact of Imabalanced training Data on Local matching learning of ontogolie

Matching learning corresponds to the combination of ontology matching and machine learning techniques. This strategy has gained increasing attention in recent years. However, state-of-the-art approaches implementing matching learning strategies are not well-tailored to deal with imbalanced training sets. In this paper, we address the problem of the imbalanced training sets and their impacts on the performance of the matching learning