This blog has been moved to new address: http://www.trongkhoanguyen.com.
As you can see, module spark-core is the foundation framework for all the others. This module provides the implementations for spark computing engine: rdd, schedule, deploy, executor, storage, shuffle, …
Module spark-sql including spark-hive and spark-catalyst lets you query structured data as a distributed dataset by using SQL queries. module spark-hive provides the capability of interacting with hive, and module spark-catalyst is used as a query optimization framework for spark.
Module spark-lib is a scalable machine learning library leveraging the power of computing of spark. spark-lib can even run on streaming data or use sql-queries to extract.
Module spark-streaming and spark-graphx make it easy to build scalable fault-tolerant streaming applications and graph-parallel computation, respectively.