CIDR 2015 Accepted Paper

April 14th, 2015

The Brown Data Management Group has the following paper in CIDR 2015:

       Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Cetintemel, Stan Zdonik

    There is a fundamental discrepancy between the targeted and actual
    users of current analytics frameworks. Most systems are designed
    for the challenges of the Googles and Facebooks of the world—
    processing petabytes of data distributed across large cloud deployments
    consisting of thousands of cheap commodity machines. Yet,
    the vast majority of users analyze relatively small datasets of up
    to several terabytes in size, perform primarily compute-intensive
    operations, and operate clusters ranging from only a few to a few
    dozen nodes. Targeting these users fundamentally changes the way
    we should build analytics systems.
    This paper describes our vision for the design of TUPLEWARE,
    a new system specifically aimed at complex analytics on small
    clusters. TUPLEWARE’s architecture brings together ideas from the
    database and compiler communities to create a powerful end-to-end
    solution for data analysis that compiles workflows of user-defined
    functions into distributed programs. Our preliminary results show
    performance improvements of up to three orders of magnitude over
    alternative systems.