Accelerating Queries on Very Large Datasets

Ekow Otoo and Kesheng Wu


In this book chapter, we explore ways to answer queries on large multi-dimensional data efficiently. Given a large dataset, a user often wants to access only a relatively small number of the records. Such a selection process is typically performed through an SQL query in a database management system (DBMS). In general, the most effective technique to accelerate the query answering process is indexing. For this reason, our primary emphasis is to review indexing techniques for large datasets. Since much of scientific data is not under the management of DBMS systems, our review includes many indexing techniques outside of DBMS systems as well. Among the known indexing methods, bitmap indexes are particularly well suited for answering such queries on large scientific data. Therefore, more details are given on the state of the art of bitmap indexing techniques. This chapter also briefly touches on some emerging data analysis systems that don't yet make use of indexes. We present some evidence that these systems could also benefit from the use of indexes.

full text of the report (PDF)

Closely related
Info about the book including this chapter
FastBit software
More research work by John Wu
Bitmap Index
Connected Component Labeling
Eigenvalue Computation
Inforamtion available elsewhere on the web
Google Scholar
Contact us

John Wu