DEMOS : a DEsign Method for demOcratic information System

The issue of democracy in society is at the heart of our current concerns. Organizations and their information systems are also concerned by this issue. Democracy in organization requires a debate about norms, values and language encapsulated in the information system. Participatory design approaches address this issue by proposing a democratic empowerment for users during design phase of projects. To

Read More

Morphologically annotated amharic text corpora

In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the documents to be relevant. Stemmers are known to be effective in many languages for IR. However, there are still languages

Read More

Application Performance Anomaly Detection with LSTM on Temporal Irregularities in Logs

Performance anomalies are a core problem in modern information systems, that affects the execution of the hosted applications. The detection of these anomalies often relies on the analysis of the application execution logs. The current most effective approach is to detect samples that differ from a learnt nominal model. However, current methods often focus on detecting sequential anomalies in logs,

Read More

Human-Interpretable Rules for Anomaly Detection in Time-series

Anomaly detection in time series is a widely studied issue in many areas. Anomalies can be detected using rule-based approaches and human-interpretable rules for anomaly detection refer to rules presented in a format that is intelligible to analysts. Learning these rules is a challenge but only a few works address the issue of detecting different types of anomalies in time-series.

Read More

METING: A Robust Log Parser Based on Frequent n-Gram Mining

Execution logs are a pervasive resource to monitor modern information systems. Due to the lack of structure in raw log datasets, log parsing methods are used to automatically retrieve the structure of logs and gather logs of common templates. Parametric log parser are commonly preferred since they can modulate their behaviour to fit different types of datasets. These methods rely

Read More

Outlier detection in multivariate functional data based on a geometric aggregation

The increasing ubiquity of multivariate functional data (MFD) requires methods that can properly detect outliers within such data, where a sample corresponds to p>1p>1 parameters observed with respect to (w.r.t) a continuous variable (extit{e.g.} time). We improve the outlier detection in MFD by adopting a geometric view on the data space while combining the new data representation with state-of-the-art outlier detection algorithms.

Read More

DECWA: Density-Based Clustering using Wasserstein Distance

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new

Read More

Text Simplification for Scientific Information Access: CLEF 2021 SimpleText Workshop

Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in

Read More

Answering GPSJ queries in a polystore: A dataspace-based approach

The discipline of data science is steering analysts away from traditional data warehousing and towards a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper,

Read More

Information Nutritional Label and Word Embedding to Estimate Information Check-Worthiness

Automatic fact-checking is an important challenge nowadays since anyone can write about anything and spread it in social media, no matter the information quality. In this paper, we revisit the information check-worthiness problem and propose a method that combines the “information nutritional label” features with POS-tags and word-embedding representations. To predict the information check-worthy claim, we train a machine learning

Read More