A framework for mining and querying summarized data through association rules
DEI - 3B Room
February 11th, 2011
The massive amount of datasets expressed in different formats, such as relational, XML, and RDF, available in several real applications, may cause some difficulties to non-expert users trying to access these datasets without having sufficient knowledge on their content and structure. Also the process of query composition, -especially in the absence of a schema-, and interpretation of the obtained answers may be non-trivial. Data mining techniques, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, provide several interesting solutions for knowledge elicitation. However, the mining process is often guided by the designer, who, having a deep knowledge of the application scenario, establishes which is the portion of a dataset where useful patterns can be extracted.
In this work we describe the TreeRuler tool, which makes it possible for inexperienced users to access huge XML or relational datasets. TreeRuler encompasses two main features: 1) it mines all the frequent association rules from input documents, in the XML case without any a-priori specification of the desired results, and 2) it provides quick, summarized, thus often approximate answers to user’s queries, by using the previously mined knowledge. TreeRuler has been developed in the scenario of the Odyssey EU project dealing with information about crimes, both for the relational and XML data model.
Web, multimedia and databases