Kubilay Atasu
IBM Research, Zurich
DEIB – Seminar Room (Building 20)
July 12th, 2018
11.00 am
Contacts:
Mariagiovanna Sami
Cristina Silvano
Research Line:
System architectures
Modern data processing systems should be capable of ingesting, storing and searching across a prodigious amount of textual information. Efficient text search encompasses both high quality of the results and the speed of execution. The Word Mover’s Distance (WMD) proposed recently by Kusner et. al., capitalises on vector representations of words to capture the semantic similarity between text documents. However, the computational cost of WMD grows cubically in the number of words in a document. Our work shows that a relaxed version of the WMD problem can be solved in linear time when operating on large sets of documents. In addition, our linear-complexity method maps well onto Graphics Processing Units (GPUs) and can be efficiently distributed across a cluster of GPUs. Our experiments on real-life datasets demonstrate 1) a performance improvement of three to four orders of magnitude with respect to an optimised WMD implementation, and 2) a marginal loss in accuracy due to the relaxation used. These results suggest that we could use the high-quality results of our method to query --- in sub-second time --- at least 100 million documents using only a modest computational infrastructure.
Dr. Kubilay Atasu joined IBM Research - Zurich in 2008 after holding research positions at Swiss Federal Institute of Technology Lausanne (EPFL) and at Imperial College London, where he worked on design automation of application-specific processors. At IBM Research, he initially contributed to the design and development of IBM's PowerEN (Edge-of-Network) processor. Later, he led the design and development of reconfigurable accelerators for high-speed text analytics. Currently, he is developing scalable algorithms and architectures for large-scale graph processing and machine learning. He is a Senior Member of IEEE.