Spark

A World leader in advanced plasma processes

Being commercialized in multiple applications around the world including plasma torches, Industrial 3D printing powders, aluminum & zinc dross recovery, waste management and defence - 4 US aircraft carriers

Reply New Message

Prev Message Back To Forum Threaded View Next Message

Questionsgems

Rank: Mail Room [?]

Points: 17 [?]

Votes: 0 [?]

Your Vote:

Please Log In to Vote

Did you know? You can earn activity points by filling your profile with information about yourself (what city you live in, your favorite team, blogs etc.)

Spark

posted on Feb 07, 2019 01:13AM

Explain the concept of Resilient Distributed Dataset (RDD).

RDD is an abbreviation for Resilient Distribution Datasets. An RDD is a blame tolerant accumulation of operational components that keep running in parallel. The divided information in RDD is permanent and distributed in nature. There are fundamentally two sorts of RDD:

Parallelized Collections: Here, the current RDDs run parallel with each other.

Hadoop Datasets:

They perform works on each document record in HDFS or other stockpiling frameworks.

RDDs are essential parts of information that are put away in the memory circulated crosswise over numerous hubs. RDDs are sluggishly assessed in Spark. This apathetic assessment is the thing that adds to Spark’s speed. spark-interview-questions/

Reply New Message

Prev Message Back To Forum Threaded View Next Message

Please to post a reply