Apache Spark modules and their dependencies

This blog has been moved to new address: http://www.trongkhoanguyen.com.

Apache Spark Modules

Apache Spark Modules

As you can see, module spark-core is the foundation framework for all the others. This module provides the implementations for spark computing engine: rdd, schedule, deploy, executor, storage, shuffle, …

Module spark-sql including spark-hive and spark-catalyst lets you query structured data as a distributed dataset by using SQL queries. module spark-hive provides the capability of interacting with hive, and module spark-catalyst is used as a query optimization framework for spark.

Module spark-lib is a scalable machine learning library leveraging the power of computing of spark. spark-lib can even run on streaming data or use sql-queries to extract.

Module spark-streaming and spark-graphx make it easy to build scalable fault-tolerant streaming applications and graph-parallel computation, respectively.

This entry was posted in Architecture, Spark and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s