Machine learning (ML) and statistical techniques are key to transforming big data into actionable knowledge. In spite of the modern primacy of data, the complexity of existing ML algorithms is often overwhelming and many users do not understand the trade-offs and challenges of parameterizing and choosing between different learning techniques. Furthermore, existing scalable systems that support ML are typically not accessible to ML researchers without a strong background in distributed systems and low-level primitives.
With MLbase, we tackle both of these issues simultaneously, leveraging the aligned incentives between ML researchers and non-expert practitioners to build a single platform for consuming and developing ML. Moreover, MLbase provides an attractive interface between ML and systems researchers, as the efforts of both groups naturally complement one another. MLbase provides (1) a simple declarative way to specify ML tasks, (2) a novel optimizer to select and dynamically adapt the choice of learning algorithm, (3) a set of high-level operators to enable ML researchers to scalably implement a wide range of ML methods without deep systems knowledge, and (4) a run-time optimized for the data-access patterns of these high-level operators.