A World leader in advanced plasma processes

Being commercialized in multiple applications around the world including plasma torches, Industrial 3D printing powders, aluminum & zinc dross recovery, waste management and defence - 4 US aircraft carriers

Free
Message: Spark

Explain the concept of Resilient Distributed Dataset (RDD).

RDD is an abbreviation for Resilient Distribution Datasets. An RDD is a blame tolerant accumulation of operational components that keep running in parallel. The divided information in RDD is permanent and distributed in nature. There are fundamentally two sorts of RDD:

Parallelized Collections: Here, the current RDDs run parallel with each other.

Hadoop Datasets:

They perform works on each document record in HDFS or other stockpiling frameworks.

RDDs are essential parts of information that are put away in the memory circulated crosswise over numerous hubs. RDDs are sluggishly assessed in Spark. This apathetic assessment is the thing that adds to Spark’s speed. spark-interview-questions/


Feb 07, 2019 01:14AM
Share
New Message
Please login to post a reply