October 27th, 2016

Project Longview: Querying the Future Now

Summary

Our goal is to develop data management technology that would simplify building predictive analytics applications over large-scale data. Predictive analytics involves analyzing historical and current data to make predictions about future or missing data values, events, and trends, and has a wide range of applications in security, marketing, economics, sociology, genetics and computing. The generic predictive database technology to be developed will make computing with predictions on large-scale data sets or high-rate data streams easier to express and far more efficient than the prevalent application-level solutions that are known to be brittle and unscalable.

The concrete product of the project will be a new type of database system, called Longview, that seamlessly integrates predictive models as first-class primitives by intelligently incorporating them in the process of data management and query optimization. Longview will develop novel algorithms, data structures and interfaces to automatically load, train, select, and execute predictive models. The project will also investigate “white-box” model support, in which the knowledge of the semantics and representation of models, if available, will be used to enhance the quality and performance of predictions. We also expect that the resulting technology will also allow for a deeper understanding and support for user-defined functions in database systems.

Participants:

Faculty:

Students:

Publications:

  • M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik, "Learning-based Query Performance Modeling and Prediction," in Proceedings of ICDE’12, 2012. [PDF] [BIBTEX]
    @inproceedings{akdereCRUZ12,
      author = {Mert Akdere and Ugur Cetintemel and Matteo Riondato and Eli Upfal and Stanley Zdonik},
      title = {Learning-based Query Performance Modeling and Prediction},
      booktitle={Proceedings of ICDE'12},
      year = {2012},
      url = {http://www.cs.brown.edu/~matteo/papers/AkdereEtAl-LearninQueryPerfModel-ICDE12.pdf},
     }
  • M. Riondato and E. Upfal, "Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees," in Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases 2012, 2012. [PDF] [BIBTEX]
    @inproceedings{riondato12ecml,
      author = {Riondato, Matteo and Upfal, Eli},
      booktitle = {Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases 2012},
      title = {Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees},
      series = {ECML PKDD '12},
      year = {2012},
      url = {http://arxiv.org/pdf/1111.6937},
     }
  • M. Riondato, J. DeBrabant, R. Fonseca, and E. Upfal, "PARMA: A Parallel Randomized Algorithm for Association Rules Mining in MapReduce," in Proceedings of the 21th ACM International Conference on Information and Knowledge Management, CIKM 2012, October 29 – November 2, 2012, Maui, HI, USA, 2012. [PDF] [BIBTEX]
    @inproceedings{riondato12cikm,
      author = {Riondato, Matteo and DeBrabant, Justin and Fonseca, Rodrigo and Upfal, Eli},
      booktitle = {Proceedings of the 21th ACM International Conference on Information and Knowledge Management, CIKM 2012, October 29 - November 2, 2012, Maui, HI, USA},
      title = {{PARMA}: A Parallel Randomized Algorithm for Association Rules Mining in {MapReduce}},
      series = {CIKM '12},
      year = {2012},
      url = {http://www.cs.brown.edu/~matteo/papers/RiondatoEtAl-PARMA.pdf},
     }
  • M. Akdere, U. Çetintemel, M. Riondato, and S. Zdonik, "The Case for Predictive Database Systems: Opportunities and Challenges," in Proceedings of CIDR’11, 2011. [PDF] [BIBTEX]
    @inproceedings{akdere2011case,
      title = {{The Case for Predictive Database Systems: Opportunities and Challenges}},
      author={Akdere, Mert and {\c{C}}etintemel, Ugur and Riondato, Matteo and Zdonik, Stan},
      booktitle={Proceedings of CIDR'11},
      url={http://database.cs.brown.edu/papers/cidr11/predictive_systems.pdf},
      year={2011},
     }
  • J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal, "Performance Prediction for Concurrent Database Workloads," in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011, pp. 337-348. [PDF] [BIBTEX]
    @inproceedings{duggan2011,
      author = {Duggan, Jennie and Cetintemel, Ugur and Papaemmanouil, Olga and Upfal, Eli},
      title = {{Performance Prediction for Concurrent Database Workloads}},
      booktitle = {Proceedings of the 2011 ACM SIGMOD International Conference on Management of data},
      series = {SIGMOD '11},
      year = {2011},
      pages = {337--348},
      numpages = {12},
      publisher = {ACM},
      url={http://database.cs.brown.edu/papers/sigmod11/concurrency_sigmod.pdf},
     }
  • M. Riondato, M. Akdere, U. Çetintemel, S. B. Zdonik, and E. Upfal, "The VC-Dimension of SQL Queries and Selectivity Estimation through Sampling," in Machine Learning and Knowledge Discovery in Databases – European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II, 2011, pp. 661-676. [PDF] [BIBTEX]
    @inproceedings{RiondatoACZU11,
      author = {Riondato, Matteo and Akdere, Mert and \c{C}etintemel, Ugur and Zdonik, Stanley B. and Upfal, Eli},
      booktitle = {Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II},
      pages = {661-676},
      title = {The {VC}-Dimension of {SQL} Queries and Selectivity Estimation through Sampling},
      year = {2011},
      eries = {ECML PKDD '11},
      url = {http://arxiv.org/pdf/1101.5805v3},
     }
  • M. Akdere, U. Çetintemel, and E. Upfal, "Database-support for Continuous Prediction Queries over Streaming Data," in Proceedings of the 36th International Conference on Very Large Data Bases, 2010. [PDF] [BIBTEX]
    @inproceedings{akdere2010database,
      title = {{Database-support for Continuous Prediction Queries over Streaming Data}},
      booktitle = {Proceedings of the 36th International Conference on Very Large Data Bases},
      location = {Singapore},
      month = {September},
      publisher = {VLDB Endowment},
      author={Akdere,, Mert and \c{C}etintemel,, U\u{g}ur and Upfal,, Eli},
      year={2010},
      url={http://database.cs.brown.edu/papers/vldb10/cpq.pdf},
     }

Acknowledgements:

The Longview project is supported by the NSF grant IIS-0905553.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.