Proven plasma torch processes for US military, 3D powders for aircraft engines and solar grade silicon metal for solar industry

Message: Spark

Explain the concept of Resilient Distributed Dataset (RDD).

RDD is an abbreviation for Resilient Distribution Datasets. An RDD is a blame tolerant accumulation of operational components that keep running in parallel. The divided information in RDD is permanent and distributed in nature. There are fundamentally two sorts of RDD:

Parallelized Collections: Here, the current RDDs run parallel with each other.

Hadoop Datasets:

They perform works on each document record in HDFS or other stockpiling frameworks.

RDDs are essential parts of information that are put away in the memory circulated crosswise over numerous hubs. RDDs are sluggishly assessed in Spark. This apathetic assessment is the thing that adds to Spark’s speed. spark-interview-questions/

Feb 07, 2019 01:14AM
New Message
Please login to post a reply