Arnaud Giacometti

Arnaud Giacometti
University of Tours | UFR · Département d'Informatique

About

92
Publications
8,858
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
837
Citations
Citations since 2017
32 Research Items
295 Citations
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050

Publications

Publications (92)
Article
This paper presents Versus, which is the first automatic method for generating comparison tables from knowledge bases of the Semantic Web. For this purpose, it introduces the contextual reference level to evaluate whether a feature is relevant to compare a set of entities. This measure relies on contexts that are sets of entities similar to the com...
Preprint
Itemset mining methods are techniques to discover relevant patterns in transactional databases. The first approach, called constrained-based pattern mining, is based on exhaustive pattern mining techniques which consist in returning all itemsets that satisfy a given constraint. The main issues that hinder their efficiency are the pattern explosion...
Article
Full-text available
Many applications rely on distributed databases like sensor networks or the Semantic Web. However, only few methods exist to extract patterns without centralizing the data by following the exhaustive extraction paradigm. Their principle is to extract a unique large collection of frequent patterns that will be used for all downstream applications. U...
Conference Paper
Full-text available
As more and more knowledge graphs (KG) are published in the Web, there is a need of tools for abstracting their content for their producers to verify their result, and for their consumers to use it. This implies showing the schema-level patterns instantiated in the graph, with the frequency with which they are instantiated. A profile represents thi...
Chapter
Full-text available
Many applications generate data streams where online analysis needs are essential. In this context, pattern mining is a complex task because it requires access to all data observations. To overcome this problem, the state-of-the-art methods maintain a data sample or a compact data structure retaining only recent information on the main patterns. Th...
Chapter
Full-text available
Comparison table is an efficient tool for comparing a small number of entities for decision making to analyze the main similarities and differences. The manual choice of their comparison features remains a complex and tedious task. This paper presents \(\textsc { Versus}\), which is the first automatic method for generating comparison tables from k...
Conference Paper
Full-text available
Les tableaux comparatifs sont utiles pour comparer des entités en dégageant leurs similarités et leurs différences non triviales. Le choix manuel des caractéristiques de comparaison reste une tâche complexe et fastidieuse. Cet article présente VERSUS qui est la première méthode automatique de génération de tableaux comparatifs à partir du Web séman...
Conference Paper
Full-text available
Analyser l'impact d'entités au sein de leur domaine est fondamental pour le comprendre. A cette fin, il est essentiel de disposer d'indicateurs nu-mériques fins retranscrivant les spécificités du domaine. Cet article propose une approche pour découvrir automatiquement des indicateurs d'impact pour classer les entités. Bien que l'approche soit trans...
Chapter
Full-text available
Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication of extracted patterns from the different nodes. To circumvent this difficulty, this paper revisits the problem of pattern mining in...
Preprint
Full-text available
Automatic identification of mutiword expressions (MWEs) is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variab...
Article
Full-text available
Sequential pattern mining has been introduced by Agrawal and Srikant (in: Proceedings of ICDE’95, pp 3–14, 1995) 2 decades ago, and its usefulness has been widely proved for different mining tasks and application fields such as web usage mining, text mining, bioinformatics, fraud detection and so on. Since 1995, despite numerous optimization propos...
Conference Paper
Full-text available
Cet article propose une nouvelle variante de la mesure FPOF (Frequent Pattern Outlier Factor) en introduisant une contrainte de taille maximale sur les motifs extraits pour calculer le degré d’aberration des entités d’une base de connaissance du Web Sémantique atteinte de la malédiction de la longue traîne. Nous verrons que cette mesure sous contra...
Chapter
Full-text available
Semantic Web connects huge knowledge bases whose content has been generated from collaborative platforms and by integration of heterogeneous databases. Naturally, these knowledge bases are incomplete and contain erroneous data. Knowing their data quality is an essential long-term goal to guarantee that querying them returns reliable results. Having...
Conference Paper
Full-text available
La découverte de motifs est une méthode intéressante pour extraire des variables représentatives d'un jeu de données à des fins de classification. Il est possible d'obtenir un nombre raisonnable de motifs complémentaires qui décrivent bien le jeu de données en recourant à l'échantillonnage de motifs. Cette technique récente consiste à tirer aléatoi...
Conference Paper
Full-text available
Contrairement aux méthodes de découvertes de motifs exhaustives qui retournent l'ensemble complet des motifs intéressants, les mé-thodes d'extraction par échantillonnage visent à construire des ensembles représentatifs de motifs intéressants. En général, elles ont pour avantage d'être très rapides, de ne pas nécessiter l'ajustement de seuils d'inté...
Conference Paper
Full-text available
De nombreuses applications s'appuient sur des bases de données dis-tribuées. Pourtant, peu de méthodes de découverte de motifs ont été proposées pour les extraire sans centraliser les données. Il faut dire que cette centralisa-tion est souvent moins coûteuse que la communication des motifs extraits. Pour contourner cette difficulté, cet article ado...
Chapter
How can one determine whether a data mining method extracts interesting patterns? The paper deals with this core question in the context of unsupervised problems with binary data. We formalize the quality of a data mining method by identifying patterns – the supporters and opponents – which are related to a pattern extracted by a method. We define...
Conference Paper
Full-text available
In recent years, the field of pattern mining has shifted to user-centered methods. In such a context, it is necessary to have a tight coupling between the system and the user where mining techniques provide results at any time or within a short response time of only few seconds. Pattern sampling is a non-exhaustive method for instantly discovering...
Poster
Full-text available
In recent years, the field of pattern mining has shifted to user-centered methods. In such a context, it is necessary to have a tight coupling between the system and the user where mining techniques provide results at any time or within a short response time of only few seconds. Pattern sampling is a non-exhaustive method for instantly discovering...
Conference Paper
Full-text available
L'échantillonnage de motifs est une méthode non-exhaustive pour dé-couvrir des motifs pertinents qui assure une bonne interactivité tout en offrant des garanties statistiques fortes grâce à sa nature aléatoire. Curieusement, une telle approche explorée pour les motifs ensemblistes et les sous-graphes ne l'a pas encore été pour les données séquentie...
Chapter
Full-text available
Pattern mining in numerical data remains a challenging task due to the pattern search space that becomes potentially infinite with real-valued dimensions. Most approaches reluctantly reduced the expressiveness of mined patterns to make possible extraction. Despite this expressiveness loss, they do not provide results within a short response time of...
Chapter
Mining frequent itemsets in large datasets has received much attention in recent years relying on MapReduce programming model. For instance, many efficient Frequent Itemset Mining (a.k.a. FIM) algorithms have been parallelized to MapReduce principle such as Parallel Apriori, Parallel FP-Growth and Dist-Eclat. However, most approaches focus on job p...
Article
Mining frequent itemsets in large datasets has received much attention in recent years relying on MapReduce programming model. For instance, many efficient Frequent Itemset Mining (a.k.a. FIM) algorithms have been parallelized to MapReduce principle such as Parallel Apriori, Parallel FP-Growth and Dist-Eclat. However, most approaches focus on job p...
Conference Paper
Full-text available
Many data exploration tasks require a target class. Unfortunately, the data is not always labeled with respect to this desired class. Rather than using unsupervised methods or a labeling pre-processing, this paper proposes an interactive system that discovers this target class and characterizes it at the same time. More precisely, we introduce a ne...
Conference Paper
Full-text available
Mining frequent itemsets in large datasets received much attention, in recent years, using MapReduce programming model. Many famous FIM algorithms have been parallelized in MapReduce framework like Parallel Apriori , Parallel FP-Growth and Dist-Eclat. However, most work focus on work partitioning and/or load balancing but they are not exentensible...
Article
Full-text available
Outlier detection consists in detecting anomalous observations from data. During the past decade, outlier detection methods were proposed using the concept of frequent patterns. Basically such methods require to mine all frequent patterns for computing the outlier factor of each transaction. This approach remains too expensive despite recent progre...
Conference Paper
Full-text available
Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provi...
Article
Full-text available
Remplacer des hypothèses sur le modèle de données par des infor-mations mesurées sur les données réelles est l'une des forces de la fouille de données. Cet article étudie cet ajustement entre les données et les méthodes de découverte de motifs pour en évaluer la qualité et la complexité. Nous formali-sons ce lien entre données et mesures d'intérêt...
Conference Paper
Full-text available
A main challenge in pattern mining is to focus the discovery on high-quality patterns. One popular solution is to compute a numerical score on how well each discovered pattern describes the data. The best rating patterns are then the most analyzed by the data expert. In this paper, we evaluate the quality of discovered patterns by anticipating of h...
Article
Full-text available
In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of Pattern Mining: "Mining Association Rules between Sets of Items in Large Databases". Beyond the introduction to a new problem, it introduced a new methodology in terms of resolution and evaluation. For two decades, Pattern Mining has been one of the...
Article
For two decades, pattern discovery has been one of the most active fields in data mining. This paper provides a quantitative survey of the literature relying on 1030 publications from five major international conferences. We first measured a severe slowdown of research dedicated to pattern discovery. Then, we quantified the main contributions with...
Chapter
Recommending database queries is an emerging and promising field of research and is of particular interest in the domain of OLAP systems, where the user is left with the tedious process of navigating large datacubes. In this paper, the authors present a framework for a recommender system for OLAP users that leverages former users’ investigations to...
Conference Paper
Full-text available
The emerging of ubiquitous computing technologies in recent years has given rise to a new field of research consisting in incorporating context-aware preference querying facilities in database systems. One important step in this setting is the Preference Elicitation task which consists in providing the user ways to inform his/her choice on pairs of...
Conference Paper
Full-text available
The elegant integration of pattern mining techniques into database remains an open issue. In particular, no language is able to manipulate data and patterns without introducing opaque operators or loop-like statement. In this paper, we cope with this problem using relational algebra to formulate pattern mining queries. We introduce several operator...
Article
Recommending database queries is an emerging and promising field of research and is of particular interest in the domain of OLAP systems, where the user is left with the tedious process of navigating large datacubes. In this paper, the authors present a framework for a recommender system for OLAP users that leverages former users' investigations to...
Conference Paper
Full-text available
A major problem when dealing with association rules post-processing is the huge amount of extracted rules. Several approaches have been implemented to summarize them. However, the obtained summaries are generally difficult to analyse because they suffer from the lack of navigational tools. In this paper, we propose a novel method for summarizing la...
Article
In this article, we consider a new kind of temporal pattern where both interval and punctual time representation are considered. These patterns, which we call temporal point-interval patterns, aim at capturing how events taking place during different time periods or at different time instants relate to each other. The datasets where these kinds of...
Article
Full-text available
Previous studies on mining sequential patterns have focused on temporal patterns specified by some form of propositional temporal logic. However, there are some interesting sequential patterns whose specification needs a more expressive formalism, the first-order temporal logic. In this paper, we focus on the problem of mining multi-sequential patt...
Conference Paper
Full-text available
Recommending database queries is an emerging and promising field of investigation. This is of particular interest in the domain of OLAP systems where the user is left with the tedious process of navigating large datacubes. In this paper we present a framework for a recommender system for OLAP users, that leverages former users' investigations to en...
Article
Full-text available
Le résumé est très utilisé pour représenter de grands ensembles de motifs, en particulier les ensembles de règles d'association. Généralement, les résumés de règles d'association proposés dans la littérature ne peuvent être présentés que sous forme de liste. De ce fait, il est difficile de les analyser. Dans ce travail, nous proposons une fonction...
Conference Paper
Full-text available
Discovering global models on a dataset (e.g., classifiers, clusterings, summaries) has attracted a lot of attention and many approaches can be found in the literature. However no framework has been proposed yet for describing and comparing these approaches in a uniform manner. In this paper we propose such a framework for pattern-based modeling app...
Conference Paper
Interactive analysis of datacube, in which a user navigates a cube by launching a sequence of queries is often tedious since the user may have no idea of what the forthcoming query should be in his current analysis. To better support this process we propose in this paper to apply a Collaborative Work approach that leverages former explorations of t...
Chapter
In this article, we consider a new kind of temporal pattern where both interval and punctual time representation are considered. These patterns, which we call temporal point-interval patterns, aim at capturing how events taking place during different time periods or at different time instants relate to each other. The datasets where these kinds of...
Conference Paper
An OLAP analysis session can be defined as an interactive session during which a user launches queries to navigate within a cube. Very often choosing which part of the cube to navigate further, and thus designing the forthcoming query, is a difficult task. In this paper, we propose to use what the OLAP users did during their former exploration of t...
Conference Paper
Full-text available
Résumé Les motifs son a l'origine de nombreuses découvertes de connaissances dans les bases de données, mais leur nombre, trop important, limite encore bien souvent leur usage. Afin de lever cette difficulté, une collection de motifs peu etre condensée en une représentatio equivalente, mais de taille inférieure. La plupart des travaux se focalisent...
Article
Full-text available
In this article, we consider a new kind of temporal pattern where both interval and punctual time represetation are considered. These patterns, which we call temporal point-interval patterns, aim at capturing how events taking place during different time periods or at different time instants relate to each other. The datasets where these kinds of p...
Chapter
Full-text available
In this chapter, we have presented several approaches for treating preferences over objects, sets of objects and sequences of objects. The main contribution is centered in Section 4 which presents a method for preference elicitation and reasoning over sequence of objects. An algorithm for finding the most preferred sequences satisfying a set of tem...
Article
Full-text available
Classification is an important field of data mining problems. Given a set of labeled training examples the classification task constructs a classifier. A classifier is a global model which is used to predict the class label for data objects that are unlabeled. Many approaches have been proposed for the classification problem. Among them, rule-induc...
Conference Paper
Full-text available
Most research on preference elicitation, preference reasoning and preference query languages design focus mainly on preferences over single objects represented by relational tuples. An increasing interest on preferences over more complex structures like sets of objects has arised in recent papers. However, most recent applications deal with more so...
Conference Paper
Full-text available
Most methods for temporal pattern mining assume that time is rep- resented by points in a straight line starting at some initial instant. Discover- ing sequential patterns in customer's transactions is a well-known application where such data mining methods have been used successfully. In this paper, we consider a new kind of temporal pattern where...
Conference Paper
Full-text available
Most methods for temporal pattern mining assume that time is represented by points in a straight line starting at some initial instant. In this paper, we consider a new kind of flrst order temporal pattern, specifled in Allen's Temporal Interval Logic, where time is explicitly rep- resented by intervals. We present the algorithm MILPRIT for mining...
Article
Full-text available
An OLAP analysis can be defined as an interactive session during which an user launches queries over a data warehouse. The launched queries are often interdependent, and they can be either newly defined queries or they can be existing ones that are browsed and reused. Moreover, in a collaborative environment, queries may be shared among users. This...
Chapter
Full-text available
Mining frequent queries often requires the repeated execution of some extraction algorithm for different values of the support, as well as for different source datasets. This is an expensive process, even if we use the best existing algorithms. Hence the need for iterative mining, whereby mining results already obtained are re-used to accelerate su...
Article
Full-text available
Résumé. Dans le contexte de l'analyse OLAP, le concept de navigation n'a ja-mais été défini formellement. Nous montrons pourquoi cette lacune est préju-diciable. Nous proposons ensuite une formalisation du concept de navigation, ainsi qu'une première ébauche de langage de définition de navigations.
Conference Paper
Full-text available
OLAP users heavily rely on visualization of query answers for their interactive analysis of massive amounts of data. Very often, these answers cannot be visualized entirely and the user has to navigate through them to find relevant facts.In this paper, we propose a framework for personalizing OLAP queries. In this framework, the user is asked to gi...
Article
Full-text available
La plupart des approches de découverte de connaissances traitent les extractions indé-pendamment les unes des autres, alors qu'il est reconnu que, dans la pratique, tout processus d'extraction de connaissances est interactif et itératif. L'approche que nous proposons permet l'utilisation des résultats d'extractions antérieures pour optimiser le cal...
Conference Paper
In this paper, we propose a general framework for condensed representations of sets of mining queries. To this end, we adapt the standard notions of maximal, closed and key patterns introduced in previous works, including those dealing with condensed representations. Whereas these previous works concentrate on condensed representations of the answe...
Article
Full-text available
Most methods for temporal pattern mining assume that time is pon- tual, that is, represented by points in a straight line beginning at some initial instant. In this paper, we consider a new temporal pattern, where time is ex- plicitly represented by intervals. We present the algorithm MILPRIT for mining these kind of temporal patterns, which uses v...
Article
Full-text available
Proper names represent about 10% of English or French newspaper articles. Their quantity and informational quality is already used in different Information Extraction systems. Proper names have widely been studied in the MUC conferences designed to promote research in Information Extraction. We have created our own named entity extraction tool base...
Conference Paper
Full-text available
Association rule mining often requires the repeated execution of some extraction algorithm for different values of the support and confidence thresholds, as well as for different source datasets. This is an expensive process, even if we use the best existing algorithms. Hence the need for incremental mining, whereby mining results already obtained...
Article
We consider the problem of discovering Datalog rules with negation that are significant (i.e. interesting and sufficiently valid) with respect to a given set of positive and negative facts. We propose a formal framework, as well as algorithms, for the discovery of such rules from a set of facts that are stored in the form of a set of tables in a da...
Article
Full-text available
L'Orstom (Institut Français de Recherche Scientifique pour le Développement en Coopération) s'est engagé dans la réalisation d'un réseau informatique international : le Rio (Réseau Intertropical d'Ordinateurs). Initialement conçu pour ses propres besoins, l'Institut s'efforce d'ouvrir l'accès au Rio aux partenaires scientifiques (universités, organ...
Article
Full-text available
Résumé. To optimize queries in relational databases, two categories of optimization techniques have been proposed: the Rule-Based Approach (RBA), and the Cost-Based Approach (CBA). In the RBA, the optimizer uses rule transformations using the relational algebra. In the CBA, the optimizer uses a cost model to estimate the potential cost of each oper...

Network

Cited By