Home > Accepted Papers > CIKM 2012 Accepted Paper

CIKM 2012 Accepted Paper

July 17th, 2012

The Brown Data Management Group has the following paper in CIKM for the Longview project:

  • PARMA: A Parallel Randomized Algorithm for Approximate Association Rules Mining in MapReduce
       Matteo Riondato, Justin DeBrabant, Rodrigo Fonseca, Eli Upfal

    We present a novel randomized parallel technique for mining Frequent Itemsets and Association Rules. Our mining algorithm, PARMA, achieves near-linear speedup while avoiding costly replication of data. PARMA does this by creating multiple small random samples of the transactional dataset and running a mining algorithm on the samples independently and in parallel. The resulting collections of Frequent Itemsets or Association Rules from each sample are aggregated and filtered to provide a single collection in output. Because PARMA mines random subsets of the dataset, the final result is an approximation of the exact solution. Our probabilistic analysis shows that PARMA provides tight guarantees on the quality of the approximation. The end user specifies accuracy and confidence parameters and PARMA computes an approximation of the collection of interest that satisfies these parameters. We formulated and implemented the algorithm in the MapReduce parallel computation framework. Our experimental results show that in practice the quality of the approximation is even higher than what can be analytically guaranteed. We demonstrate the correctness and scalability of PARMA by testing it on several synthetic datasets of varying size and complexity. We compare our results to two previously proposed exact parallel mining algorithms in MapReduce.

st ACM International Conference on Information and Knowledge Management (CIKM 2012) will be held from October 29 to November 2, 2012 in Maui, USA.