Welcome to the home page of the Data Management Research Group at Brown University's Department of Computer Science. Our research group is focused on a wide-range of problem domains for database management systems, including analytical (OLAP), transactional (OLTP), and scientific workloads.

Latest News

KDD 2015 Accepted Paper

May 13th, 2015

The Brown Data Management Group has the following paper in KDD 2015:

  • Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages
       Matteo Riondato and Eli Upfal

    We present an algorithm to extract an high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples of a transactional dataset. With high probability the approximation is a superset of the FIs, and no itemset with frequency much lower than the threshold is included in it. The algorithm employs progressive sampling, with a stopping condition based on bounds to the empirical Rademacher average, a key concept from statistical learning theory. The computation of the bounds uses characteristic quantities that can be obtained efficiently with a single scan of the sample. Therefore, evaluating the stopping condition is fast, and does not require an expensive mining of each sample. Our experimental evaluation confirms the practicality of our approach on real datasets, outperforming approaches based on one-shot static sampling.

CIDR 2015 Paper Accepted

April 14th, 2015

The Brown Data Management Group has the following paper in CIDR 2015:

  • TUPLEWARE: ”BIG” DATA, BIG ANALYTICS, SMALL CLUSTERS
       Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Cetintemel, Stan Zdonik

    There is a fundamental discrepancy between the targeted and actual
    users of current analytics frameworks. Most systems are designed
    for the challenges of the Googles and Facebooks of the world—
    processing petabytes of data distributed across large cloud deployments
    consisting of thousands of cheap commodity machines. Yet,
    the vast majority of users analyze relatively small datasets of up
    to several terabytes in size, perform primarily compute-intensive
    operations, and operate clusters ranging from only a few to a few
    dozen nodes. Targeting these users fundamentally changes the way
    we should build analytics systems.
    This paper describes our vision for the design of TUPLEWARE,
    a new system specifically aimed at complex analytics on small
    clusters. TUPLEWARE’s architecture brings together ideas from the
    database and compiler communities to create a powerful end-to-end
    solution for data analysis that compiles workflows of user-defined
    functions into distributed programs. Our preliminary results show
    performance improvements of up to three orders of magnitude over
    alternative systems.

PLANET got accepted at SIGMOD 2014

February 13th, 2014

The Brown Data Management Group has the following paper in SIGMOD:

  • PLANET: making progress with commit processing in unpredictable environments
       Tim Kraska

    Latency unpredictability in a database system can come from many factors, such as load spikes in the workload, inter-query interactions from consolidation, or communication costs in cloud computing or geo-replication. High variance and high latency environments make developing interactive applications difficult, because transactions may take too long to complete, or fail unexpectedly. We propose Predictive Latency-Aware NEtworked Transactions (PLANET), a new transaction programming model and underlying system support to address this issue. The model exposes the internal progress of the transaction, provides opportunities for ap- plication callbacks, and incorporates commit likelihood prediction to enable good user experience even in the presence of significant transaction delays. The mechanisms underlying PLANET can be used for admission control, thus improving overall performance in high contention situations. In the SIGMOD 2014 paper “PLANET: Making Progress with Commit Processing in Unpredictable Environments”, we present this new transaction programming model, demonstrate its expressiveness via several use cases, and evaluate its performance using a strongly consistent geo-replicated database across five data centers.