March 16th, 2010
Top-k join operators, also known as rank join, access data sources and report the k combinations with the highest aggregate score, after joining the retrieved data. We have addressed the issue of getting top-k combinations from two or more search services endowed with only sorted data access. Our approach is based on HRJN (Hash Rank Join), which is a non blocking join operator and is instance optimal in terms of the amount of data accessed. We have revisited HRJN in the context of accessing data from multiple search services in parallel, characterized by possibly heterogeneous response times. The results show that the proposed approach achieves the objective of getting the top combinations quickly, by accessing data from all services in controlled-parallel way. The method is designed so as to refrain from accessing data that do not contribute to the result. Thus, it reduces unnecessary network traffic and computational load on the servers that can be caused if the data is accessed in uncontrolled parallel way.
Web, multimedia and databases