报告题目:大数据集合查询问题研究
时间:2016年3月30日周三14:30
地点:主楼四区305
杨仝,2010-2013,就读于清华大学计算机计算机网络专业。2013-2014年到中科院计算所客座访问。2015年进入北京大学信息学院计算机系网络所,研究方向为网络与大数据。发表多篇著名国际会议论文,包括SIGCOMM、VLDB、ICNP、ICDCS等。
Abstract: Set queries are fundamental operations in computer systems and applications. This paper addresses the fundamental problem of designing a probabilistic data structure that can quickly process set queries using a small amount of memory. We propose a Shifting Bloom Filter (ShBF) framework for representing and querying sets. We demonstrate the effectiveness of ShBF using three types of popular set queries: membership, association, and multiplicity queries. The key novelty of ShBF is on encoding the auxiliary information of a set element in a location offset. In contrast, prior BF based set data structures allocate additional memory to store auxiliary information. We conducted experiments using real-world network traces, and results show that ShBF significantly advances the state-of-the-art on all three types of set queries.
该论文研究大数据处理中最基础的大集合查询问题,主要包括集合包含、多集合、元素出现次数三个关键问题。该论文未发表之前的网络版已经被哈佛大学计算机副系主任引用。