大數(shù)據(jù)中Spark任務(wù)和集群?jiǎn)?dòng)流程是什么樣的-創(chuàng)新互聯(lián)

這篇文章將為大家詳細(xì)講解有關(guān)大數(shù)據(jù)中Spark任務(wù)和集群?jiǎn)?dòng)流程是什么樣的,文章內(nèi)容質(zhì)量較高,因此小編分享給大家做個(gè)參考,希望大家閱讀完這篇文章后對(duì)相關(guān)知識(shí)有一定的了解。

超過(guò)10多年行業(yè)經(jīng)驗(yàn),技術(shù)領(lǐng)先,服務(wù)至上的經(jīng)營(yíng)模式,全靠網(wǎng)絡(luò)和口碑獲得客戶(hù),為自己降低成本,也就是為客戶(hù)降低成本。到目前業(yè)務(wù)范圍包括了:網(wǎng)站設(shè)計(jì)、做網(wǎng)站,成都網(wǎng)站推廣,成都網(wǎng)站優(yōu)化,整體網(wǎng)絡(luò)托管,小程序定制開(kāi)發(fā),微信開(kāi)發(fā),重慶App定制開(kāi)發(fā),同時(shí)也可以讓客戶(hù)的網(wǎng)站和網(wǎng)絡(luò)營(yíng)銷(xiāo)和我們一樣獲得訂單和生意!

大數(shù)據(jù)分享Spark任務(wù)和集群?jiǎn)?dòng)流程

大數(shù)據(jù)分享Spark任務(wù)和集群?jiǎn)?dòng)流程,Spark集群?jiǎn)?dòng)流程

1.調(diào)用start-all.sh腳本,開(kāi)始啟動(dòng)Master

2.Master啟動(dòng)以后,preStart方法調(diào)用了一個(gè)定時(shí)器,定時(shí)檢查超時(shí)的Worker后刪除

3.啟動(dòng)腳本會(huì)解析slaves配置文件,找到啟動(dòng)Worker的相應(yīng)節(jié)點(diǎn).開(kāi)始啟動(dòng)Worker

4.Worker服務(wù)啟動(dòng)后開(kāi)始調(diào)用preStart方法開(kāi)始向所有的Master進(jìn)行注冊(cè)

5.Master接收到Worker發(fā)送過(guò)來(lái)的注冊(cè)信息,Master開(kāi)始保存注冊(cè)信息并把自己的URL響應(yīng)給Worker

6.Worker接收到Master的URL后并更新,開(kāi)始調(diào)用一個(gè)定時(shí)器,定時(shí)的向Master發(fā)送心跳信息

任務(wù)提交流程

1.Driver端會(huì)通過(guò)spark-submit腳本啟動(dòng)SaparkSubmit進(jìn)程,此時(shí)創(chuàng)建了一個(gè)非常重要的對(duì)象(SparkContext),開(kāi)始向Master發(fā)送消息

2.Master接收到發(fā)送過(guò)來(lái)的信息后開(kāi)始生成任務(wù)信息,并把任務(wù)信息放到一個(gè)對(duì)列里

3.Master把所有有效的Worker過(guò)濾出來(lái),按照空閑的資源進(jìn)行排序

4.Master開(kāi)始向有效的Worker通知拿取任務(wù)信息并啟動(dòng)相應(yīng)的Executor

5.Worker啟動(dòng)Executor并向Driver反向注冊(cè)

6.Driver開(kāi)始把生成的task發(fā)送給相應(yīng)的Executor,Executor開(kāi)始執(zhí)行任務(wù)

集群?jiǎn)?dòng)流程

1.首先創(chuàng)建Master類(lèi)

import akka.actor.{Actor, ActorSystem, Props}

import com.typesafe.config.{Config, ConfigFactory}

import scala.collection.mutable

import scala.concurrent.duration._

class Master(val masterHost: String, val masterPort: Int) extends Actor{

// 用來(lái)存儲(chǔ)Worker的注冊(cè)信息

val idToWorker = new mutable.HashMap[String, WorkerInfo]()

// 用來(lái)存儲(chǔ)Worker的信息

val workers = new mutable.HashSet[WorkerInfo]()

// Worker的超時(shí)時(shí)間間隔

val checkInterval: Long = 15000

// 生命周期方法,在構(gòu)造器之后,receive方法之前只調(diào)用一次

override def preStart(): Unit = {

// 啟動(dòng)一個(gè)定時(shí)器,用來(lái)定時(shí)檢查超時(shí)的Worker

import context.dispatcher

context.system.scheduler.schedule(0 millis, checkInterval millis, self, CheckTimeOutWorker)

}

// 在preStart方法之后,不斷的重復(fù)調(diào)用

override def receive: Receive = {

// Worker -> Master

case RegisterWorker(id, host, port, memory, cores) => {

if (!idToWorker.contains(id)){

val workerInfo = new WorkerInfo(id, host, port, memory, cores)

idToWorker += (id -> workerInfo)

workers += workerInfo

println("a worker registered")

sender ! RegisteredWorker(s"akka.tcp://${Master.MASTER_SYSTEM}" +

s"@${masterHost}:${masterPort}/user/${Master.MASTER_ACTOR}")

}

}

case HeartBeat(workerId) => {

// 通過(guò)傳過(guò)來(lái)的workerId獲取對(duì)應(yīng)的WorkerInfo

val workerInfo: WorkerInfo = idToWorker(workerId)

// 獲取當(dāng)前時(shí)間

val currentTime = System.currentTimeMillis()

// 更新最后一次心跳時(shí)間

workerInfo.lastHeartbeatTime = currentTime

}

case CheckTimeOutWorker => {

val currentTime = System.currentTimeMillis()

val toRemove: mutable.HashSet[WorkerInfo] =

workers.filter(w => currentTime - w.lastHeartbeatTime > checkInterval)

// 將超時(shí)的Worker從idToWorker和workers中移除

toRemove.foreach(deadWorker => {

idToWorker -= deadWorker.id

workers -= deadWorker

})

println(s"num of workers: ${workers.size}")

}

}

}

object Master{

val MASTER_SYSTEM = "MasterSystem"

val MASTER_ACTOR = "Master"

def main(args: Array[String]): Unit = {

val host = args(0)

val port = args(1).toInt

val configStr =

s"""

|akka.actor.provider = "akka.remote.RemoteActorRefProvider"

|akka.remote.netty.tcp.hostname = "$host"

|akka.remote.netty.tcp.port = "$port"

""".stripMargin

// 配置創(chuàng)建Actor需要的配置信息

val config: Config = ConfigFactory.parseString(configStr)

// 創(chuàng)建ActorSystem

val actorSystem: ActorSystem = ActorSystem(MASTER_SYSTEM, config)

// 用actorSystem實(shí)例創(chuàng)建Actor

actorSystem.actorOf(Props(new Master(host, port)), MASTER_ACTOR)

actorSystem.awaitTermination()

}

}

2.創(chuàng)建RemoteMsg特質(zhì)

trait RemoteMsg extends Serializable{

}

// Master -> self(Master)

case object CheckTimeOutWorker

// Worker -> Master

case class RegisterWorker(id: String, host: String,

port: Int, memory: Int, cores: Int) extends RemoteMsg

// Master -> Worker

case class RegisteredWorker(masterUrl: String) extends RemoteMsg

// Worker -> self

case object SendHeartBeat

// Worker -> Master(HeartBeat)

case class HeartBeat(workerId: String) extends RemoteMsg

3.創(chuàng)建Worker類(lèi)

import java.util.UUID

import akka.actor.{Actor, ActorRef, ActorSelection, ActorSystem, Props}

import com.typesafe.config.{Config, ConfigFactory}

import scala.concurrent.duration._

class Worker(val host: String, val port: Int, val masterHost: String,

val masterPort: Int, val memory: Int, val cores: Int) extends Actor{

// 生成一個(gè)Worker ID

val workerId = UUID.randomUUID().toString

// 用來(lái)存儲(chǔ)MasterURL

var masterUrl: String = _

// 心跳時(shí)間間隔

val heartBeat_interval: Long = 10000

// master的Actor

var master: ActorSelection = _

override def preStart(){

// 獲取Master的Actor

master = context.actorSelection(s"akka.tcp://${Master.MASTER_SYSTEM}" +

s"@${masterHost}:${masterPort}/user/${Master.MASTER_ACTOR}")

master ! RegisterWorker(workerId, host, port, memory, cores)

}

override def receive: Receive = {

// Worker接收到Master發(fā)送過(guò)來(lái)的注冊(cè)成功的信息(masterUrl)

case RegisteredWorker(masterUrl) => {

this.masterUrl = masterUrl

// 啟動(dòng)一個(gè)定時(shí)器,定時(shí)給Master發(fā)送心跳

import context.dispatcher

context.system.scheduler.schedule(0 millis, heartBeat_interval millis, self, SendHeartBeat)

}

case SendHeartBeat => {

// 向Master發(fā)送心跳

master ! HeartBeat(workerId)

}

}

}

object Worker{

val WORKER_SYSTEM = "WorkerSystem"

val WORKER_ACTOR = "Worker"

def main(args: Array[String]): Unit = {

val host = args(0)

val port = args(1).toInt

val masterHost = args(2)

val masterPort = args(3).toInt

val memory = args(4).toInt

val cores = args(5).toInt

val configStr =

s"""

|akka.actor.provider = "akka.remote.RemoteActorRefProvider"

|akka.remote.netty.tcp.hostname = "$host"

|akka.remote.netty.tcp.port = "$port"

""".stripMargin

// 配置創(chuàng)建Actor需要的配置信息

val config: Config = ConfigFactory.parseString(configStr)

// 創(chuàng)建ActorSystem

val actorSystem: ActorSystem = ActorSystem(WORKER_SYSTEM, config)

// 用actorSystem實(shí)例創(chuàng)建Actor

val worker: ActorRef = actorSystem.actorOf(

Props(new Worker(host, port, masterHost, masterPort, memory, cores)), WORKER_ACTOR)

actorSystem.awaitTermination()

}

}

4.創(chuàng)建初始化類(lèi)

class WorkerInfo(val id: String, val host: String, val port: Int,

val memory: Int, val cores: Int) {

// 初始化最后一次心跳的時(shí)間

var lastHeartbeatTime: Long = _

}

5.本地測(cè)試需要傳入?yún)?shù):

 大數(shù)據(jù)中Spark任務(wù)和集群?jiǎn)?dòng)流程是什么樣的

關(guān)于大數(shù)據(jù)中Spark任務(wù)和集群?jiǎn)?dòng)流程是什么樣的就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,可以學(xué)到更多知識(shí)。如果覺(jué)得文章不錯(cuò),可以把它分享出去讓更多的人看到。

本文名稱(chēng):大數(shù)據(jù)中Spark任務(wù)和集群?jiǎn)?dòng)流程是什么樣的-創(chuàng)新互聯(lián)
網(wǎng)址分享:http://muchs.cn/article16/djjsdg.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供軟件開(kāi)發(fā)、電子商務(wù)、微信公眾號(hào)、企業(yè)建站、網(wǎng)頁(yè)設(shè)計(jì)公司、移動(dòng)網(wǎng)站建設(shè)

廣告

聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶(hù)投稿、用戶(hù)轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話(huà):028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)

成都app開(kāi)發(fā)公司