The impact of Imabalanced training Data on Local matching learning of ontogolie

Matching learning corresponds to the combination of ontology matching and machine learning techniques. This strategy has gained increasing attention in recent years. However, state-of-the-art approaches implementing matching learning strategies are not well-tailored to deal with imbalanced training sets. In this paper, we address the problem of the imbalanced training sets and their impacts on the performance of the matching learning

DEMOS : a DEsign Method for demOcratic information System

The issue of democracy in society is at the heart of our current concerns. Organizations and their information systems are also concerned by this issue. Democracy in organization requires a debate about norms, values and language encapsulated in the information system. Participatory design approaches address this issue by proposing a democratic empowerment for users during design phase of projects. To

Morphologically annotated amharic text corpora

In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the documents to be relevant. Stemmers are known to be effective in many languages for IR. However, there are still languages

Application Performance Anomaly Detection with LSTM on Temporal Irregularities in Logs

Performance anomalies are a core problem in modern information systems, that affects the execution of the hosted applications. The detection of these anomalies often relies on the analysis of the application execution logs. The current most effective approach is to detect samples that differ from a learnt nominal model. However, current methods often focus on detecting sequential anomalies in logs,

Human-Interpretable Rules for Anomaly Detection in Time-series

Les règles interprétables par l’homme pour la détection d’anomalies se réfèrent à des données anormales présentées dans un format qui ne peut pas être intelligible pour les analystes. L’apprentissage de ces règles est un problème difficile, alors que seuls quelques travaux abordent le problème des différents types d’anomalies dans les séries temporelles. Cet article présente un arbre de décision étendu

METING: A Robust Log Parser Based on Frequent n-Gram Mining

Execution logs are a pervasive resource to monitor modern information systems. Due to the lack of structure in raw log datasets, log parsing methods are used to automatically retrieve the structure of logs and gather logs of common templates. Parametric log parser are commonly preferred since they can modulate their behaviour to fit different types of datasets. These methods rely

Outlier detection in multivariate functional data based on a geometric aggregation

The increasing ubiquity of multivariate functional data (MFD) requires methods that can properly detect outliers within such data, where a sample corresponds to p>1p>1 parameters observed with respect to (w.r.t) a continuous variable (extit{e.g.} time). We improve the outlier detection in MFD by adopting a geometric view on the data space while combining the new data representation with state-of-the-art outlier detection algorithms.

DECWA: Density-Based Clustering using Wasserstein Distance

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new

Text Simplification for Scientific Information Access: CLEF 2021 SimpleText Workshop

Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in

Answering GPSJ queries in a polystore: A dataspace-based approach

The discipline of data science is steering analysts away from traditional data warehousing and towards a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper,