Apache Spark

Parallel Processing framawork
Primerily address machine learning limitations of Hadoop
Developed in University of Berkely in 2014
Developed using SCALA

Scala

High Level Programming Language
Support both functional and Object Oriented Programing

Functional Programming

int a = 10;
a++;
print(a)

Object a will be created and value of a is saved as 10. State of original variable is changed with a++.

int a = 10;
int a1 = a+1;
print(a)

As we are storing increamented value in a1 , state of original variable a is preserved .

When function is called with variable , original variables are preserved and new variables are created. This help in avoiding race condition when multi-threaded operations are executed as state of object does not change.

This helps to enable parallelism

Example of Functional Programming languages - LISP , SCALA,Haskel

Spark vs Hadoop

Spark has similar architecture as Hadoop including master-slave and master node- data node.
Spark outperforms Hadoop with 3X-10X improvements
Spark is popular for ML workload due to Iterative Workload Support , Hadoop is not suitable for iterative workload due to Disk-IO operation
Spark also relay on Disk-IO operation however it is controllable programittically.

PreviousSphinx NextHadoop

Last updated 1 year ago

hashtagScala

hashtagFunctional Programming

hashtagSpark vs Hadoop

Scala

Functional Programming

Spark vs Hadoop