Why Spark is spiking in the cloud

What's driving enterprise investment in Apache Spark?


In the last month, several A-list names in cloud and business computing have declared interest (and made investments) in the Apache Spark data analysis project. What got them fired up?

Some of this is legitimate excitement over a promising technology with broad applications. But it's also about yet another project that can be monetized in the cloud, by wrapping it in convenience and offering it at scale.

The allure of Spark

Among the companies in recent months expressing their devotion to Spark:

  • IBM. Aside from adding Spark support to its Bluemix PaaS, IBM is also preparing to contribute its SystemML machine learning algorithm construction technology to Spark.
  • Microsoft. Adding Spark support to Azure HDInsight (its cloud-hosted version of Hadoop).
  • Amazon. Its Elastic MapReduce service will be able to run Spark apps developed not only in Scala, but also Python and Java.
  • Huawei. The Chinese networking giant recently unveiled a project called Astro that combines Spark, Spark SQL, and HBase into a single product. Spark is already used in Huawei's Hadoop-based FusionInsight product, offered as a service by way of Huawei's burgeoning cloud platform.

Spark is attractive mainly because it provides a powerful in-memory data-processing component within Hadoop that deals with both real-time and batch events. At Yahoo, where Hadoop originally sprung up, Spark has become a cornerstone in analytics operations.