Present position: Research assistant at Politecnico di Milano
|Thesis title:||A data quality based methodology to improve sentiment analyses|
|Research area:||Information Systems|
Most companies concur that the Web has become an invaluable source of marketing information, as a very large, rich, and constantly updated knowledge base. Monitoring the Web is seen as a real-time alternative to costly paper-based marketing surveys. Furthermore, automated monitoring can provide continuous as opposed to occasional feedback. Unfortunately, managers also believe that existing tools are immature and, since critical decisions would be taken on the basis of Web monitoring information, they are admittedly cautious. In particular, they point to a need for objective evaluations of the quality of the data produced by automated Web reputation analyses and for evidence of their dependability for decision-making.
This thesis deals with the problem of finding, refining, and analyzing data for Web sentiment analyses from two main points of view: (i) it describes a methodology that can be applied when managing sentiment analysis projects, and (ii) it presents a tool to improve the dependability of Web reputation analyses which is based on data quality as a support for semantics. The tourism scenario is used as a case study to assess the effectiveness of the overall system. Specifically, the following results are discussed in the thesis:
- a methodology to support the design of sentiment analysis applications meeting users' domain-specific needs;
- data quality algorithms to support data cleansing, which are needed to improve Web 2.0 data;
- a framework for the assessment of the reputation of Web information sources for the analysis of heterogeneous non-structured data;
- a tool that is able to disambiguate, classify, and assign sentiment to text, built based on existing Open Source solutions;
- a mashup-based interface that guarantees usability and flexibility to let users compose their dashboards. This interface represents the first step to explore the use of sentiment analysis not only as a support to decision makers, but also as a tool for generic Web users.