Data Preprocessing for Big Data
Eugenio Gianniti
DEIB PhD student
DEIB - PT1 Room
April 7th, 2016
4.30 pm
Research Line:
Advanced software architectures and methodologies
DEIB PhD student
DEIB - PT1 Room
April 7th, 2016
4.30 pm
Research Line:
Advanced software architectures and methodologies
Sommario
Data mining techniques require quality data to extract correct and valuable insights. A sufficient quality is not readily available with raw data, which can often show inconsistencies, incompleteness, or similar flaws. This consideration supports the need for data preprocessing as a means for improving performance and effectiveness of learning methods. Nonetheless, the traditional approach to data preprocessing relies heavily on either researchers’ sensibility or computationally demanding algorithms. The Big Data paradigm, however, requires to tackle overwhelmingly large datasets, thus rendering the usual methods particularly impractical. Still, the exploitation of Big-Data-enabling technologies, such as Apache Hadoop or Spark, allows for adapting data preprocessing algorithms to applications requiring huge datasets, even if general techniques are object of active research to date. I will show the main findings and techniques learned at the Second International Winter School on Big Data, BIGDAT 2016, held last February in Bilbao, Spain.