Welcome to the home page of the Data Management Research Group at Brown University's Department of Computer Science. Our research group is focused on a wide-range of problem domains for database management systems, including analytical (OLAP), transactional (OLTP), and scientific workloads.
The Brown Data Management Group has the following paper in KDD 2015:
- Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages
Matteo Riondato and Eli Upfal
We present an algorithm to extract an high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples of a transactional dataset. With high probability the approximation is a superset of the FIs, and no itemset with frequency much lower than the threshold is included in it. The algorithm employs progressive sampling, with a stopping condition based on bounds to the empirical Rademacher average, a key concept from statistical learning theory. The computation of the bounds uses characteristic quantities that can be obtained efficiently with a single scan of the sample. Therefore, evaluating the stopping condition is fast, and does not require an expensive mining of each sample. Our experimental evaluation confirms the practicality of our approach on real datasets, outperforming approaches based on one-shot static sampling.
The Brown Data Management Group has the following paper in CIDR 2015:
- TUPLEWARE: ”BIG” DATA, BIG ANALYTICS, SMALL CLUSTERS
Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Cetintemel, Stan Zdonik
There is a fundamental discrepancy between the targeted and actual
users of current analytics frameworks. Most systems are designed
for the challenges of the Googles and Facebooks of the world—
processing petabytes of data distributed across large cloud deployments
consisting of thousands of cheap commodity machines. Yet,
the vast majority of users analyze relatively small datasets of up
to several terabytes in size, perform primarily compute-intensive
operations, and operate clusters ranging from only a few to a few
dozen nodes. Targeting these users fundamentally changes the way
we should build analytics systems.
This paper describes our vision for the design of TUPLEWARE,
a new system specifically aimed at complex analytics on small
clusters. TUPLEWARE’s architecture brings together ideas from the
database and compiler communities to create a powerful end-to-end
solution for data analysis that compiles workflows of user-defined
functions into distributed programs. Our preliminary results show
performance improvements of up to three orders of magnitude over
The Brown Data Management Group has the following paper in SIGMOD:
- PLANET: making progress with commit processing in unpredictable environments
Latency unpredictability in a database system can come from many factors, such as load spikes in the workload, inter-query interactions from consolidation, or communication costs in cloud computing or geo-replication. High variance and high latency environments make developing interactive applications difficult, because transactions may take too long to complete, or fail unexpectedly. We propose Predictive Latency-Aware NEtworked Transactions (PLANET), a new transaction programming model and underlying system support to address this issue. The model exposes the internal progress of the transaction, provides opportunities for ap- plication callbacks, and incorporates commit likelihood prediction to enable good user experience even in the presence of significant transaction delays. The mechanisms underlying PLANET can be used for admission control, thus improving overall performance in high contention situations. In the SIGMOD 2014 paper “PLANET: Making Progress with Commit Processing in Unpredictable Environments”, we present this new transaction programming model, demonstrate its expressiveness via several use cases, and evaluate its performance using a strongly consistent geo-replicated database across five data centers.