deploy目錄下SparkSubmit類的用法

這篇文章主要講解了“deploy目錄下SparkSubmit類的用法”，文中的講解內(nèi)容簡單清晰，易于學(xué)習(xí)與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學(xué)習(xí)“deploy目錄下SparkSubmit類的用法”吧！

公司主營業(yè)務(wù)：網(wǎng)站設(shè)計制作、成都網(wǎng)站制作、移動網(wǎng)站開發(fā)等業(yè)務(wù)。幫助企業(yè)客戶真正實現(xiàn)互聯(lián)網(wǎng)宣傳，提高企業(yè)的競爭能力。創(chuàng)新互聯(lián)公司是一支青春激揚、勤奮敬業(yè)、活力青春激揚、勤奮敬業(yè)、活力澎湃、和諧高效的團隊。公司秉承以“開放、自由、嚴(yán)謹(jǐn)、自律”為核心的企業(yè)文化，感謝他們對我們的高要求，感謝他們從不同領(lǐng)域給我們帶來的挑戰(zhàn)，讓我們激情的團隊有機會用頭腦與智慧不斷的給客戶帶來驚喜。創(chuàng)新互聯(lián)公司推出青浦免費做網(wǎng)站回饋大家。

之前說的各種腳本：spark-submit,spark-class也好，還是launcher工程也好，主要工作是準(zhǔn)備各種環(huán)境、依賴包、JVM參數(shù)等運行環(huán)境。實際的提交主要還是Spark Code中的deploy下的SparkSubmit類來負(fù)責(zé)的。

deploy目錄下的SparkSubmit類，前面提到過，主要入口方法是runMain。

我們先看看其他方法吧。

1、prepareSubmitEnvironment

這個方法準(zhǔn)備提交的環(huán)境和參數(shù)。

先判斷集群管理方式（cluster manager）：yarn、meros、k8s，standalone。部署方式（deploy mode )： client還是cluster。

后面要根據(jù)這些信息設(shè)置不同的Backend和Wapper類等。

提交模式這一段真不好講，因為它包含了太多種類的部署環(huán)境了，個性化較強，要慢慢看了。

cluster方式只看兩種：yarn cluster和standalone cluster。把yarn和standalone兩個搞懂了，其他的也就很好理解了。

這個方法返回一個四元組：

@return a 4-tuple:
* (1) the arguments for the child process,
* (2) a list of classpath entries for the child,
* (3) a map of system properties, and
* (4) the main class for the child

核心代碼

    if (deployMode == CLIENT) {
      childMainClass = args.mainClass
      if (localPrimaryResource != null && isUserJar(localPrimaryResource)) {
        childClasspath += localPrimaryResource
      }
      if (localJars != null) { childClasspath ++= localJars.split(",") }
    }
    // Add the main application jar and any added jars to classpath in case YARN client
    // requires these jars.
    // This assumes both primaryResource and user jars are local jars, or already downloaded
    // to local by configuring "spark.yarn.dist.forceDownloadSchemes", otherwise it will not be
    // added to the classpath of YARN client.
    if (isYarnCluster) {
      if (isUserJar(args.primaryResource)) {
        childClasspath += args.primaryResource
      }
      if (args.jars != null) { childClasspath ++= args.jars.split(",") }
    }

    if (deployMode == CLIENT) {
      if (args.childArgs != null) { childArgs ++= args.childArgs }
    }

 if (args.isStandaloneCluster) {
      if (args.useRest) {
        childMainClass = REST_CLUSTER_SUBMIT_CLASS
        childArgs += (args.primaryResource, args.mainClass)
      } else {
        // In legacy standalone cluster mode, use Client as a wrapper around the user class
        childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS
        if (args.supervise) { childArgs += "--supervise" }
        Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) }
        Option(args.driverCores).foreach { c => childArgs += ("--cores", c) }
        childArgs += "launch"
        childArgs += (args.master, args.primaryResource, args.mainClass)
      }
      if (args.childArgs != null) {
        childArgs ++= args.childArgs
      }
    }

// In yarn-cluster mode, use yarn.Client as a wrapper around the user class
    if (isYarnCluster) {
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
      if (args.isPython) {
        childArgs += ("--primary-py-file", args.primaryResource)
        childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
      } else if (args.isR) {
        val mainFile = new Path(args.primaryResource).getName
        childArgs += ("--primary-r-file", mainFile)
        childArgs += ("--class", "org.apache.spark.deploy.RRunner")
      } else {
        if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
          childArgs += ("--jar", args.primaryResource)
        }
        childArgs += ("--class", args.mainClass)
      }
      if (args.childArgs != null) {
        args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
      }
    }

上面這段代碼非常核心，非常重要。它定義了不同的集群模式不同的部署方式下，應(yīng)用使用什么類來包裝我們的spark程序，好適應(yīng)不同的集群環(huán)境下的提交流程。

我們就多花點時間來分析一下這段代碼。

先看看ChildMainClass：

standaloneCluster下：REST_CLUSTER_SUBMIT_CLASS=classOf[RestSubmissionClientApp].getName()

yarnCluster下：YARN_CLUSTER_SUBMIT_CLASS=org.apache.spark.deploy.yarn.YarnClusterApplication

standalone client模式下：STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()

2、runMain

上一步獲得四元組之后，就是runMain的流程了。

核心代碼先上：

private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
    val loader = getSubmitClassLoader(sparkConf)
    for (jar <- childClasspath) {
      addJarToClasspath(jar, loader)
    }
    var mainClass: Class[_] = null
    try {
      mainClass = Utils.classForName(childMainClass)
    } catch {
      
    }
    val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
      mainClass.getConstructor().newInstance().asInstanceOf[SparkApplication]
    } else {
      new JavaMainApplication(mainClass)
    }

    try {
      app.start(childArgs.toArray, sparkConf)
    } catch {
      case t: Throwable =>
        throw findCause(t)
    }
  }

搞清了prepareSubmitEnvironment的流程，runMain也就很簡單了，它就是啟動ChildMainClass（是SparkApplication的子類），然后執(zhí)行start方法。

如果不是cluster模式而是client模式，那么ChildMainClass就是args.mainClass。這點需要注意下，這時候ChildMainClass就會用JavaMainApplication來包裝了：

new JavaMainApplication(mainClass)；

后面的內(nèi)容就是看看RestSubmissionClientApp和org.apache.spark.deploy.yarn.YarnClusterApplication的實現(xiàn)邏輯了。

感謝各位的閱讀，以上就是“deploy目錄下SparkSubmit類的用法”的內(nèi)容了，經(jīng)過本文的學(xué)習(xí)后，相信大家對deploy目錄下SparkSubmit類的用法這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是創(chuàng)新互聯(lián)，小編將為大家推送更多相關(guān)知識點的文章，歡迎關(guān)注！

本文名稱：deploy目錄下SparkSubmit類的用法
網(wǎng)頁網(wǎng)址：http://muchs.cn/article4/jcjgoe.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供Google、商城網(wǎng)站、營銷型網(wǎng)站建設(shè)、全網(wǎng)營銷推廣、響應(yīng)式網(wǎng)站、網(wǎng)站營銷

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容