本文是对 org.apache.spark.deploy.yarn.ApplicationMasterArguments 源码学习的分析，spark的版本为2.11。

概述

org.apache.spark.deploy.yarn.ApplicationMasterArguments类主要用来对ApplicationMaster参数进行解析。

主要方法分析

parseArgs

该方法就是用来解析参数的。方法的定义如下

private def parseArgs(inputArgs: List[String]): Unit = {
  val userArgsBuffer = new ArrayBuffer[String]()
  var args = inputArgs
  // 从这个匹配可以看出，可以使用的参数列在下面，如果包含了其他参数，系统会异常退出
  // --jar jar包
  // --class 类
  // --primary-py-file  PYTHON语言编写的application
  // --primary-r-file   R语言编写的application
  // --arg  其他参数，多个参数需要使用多个 --arg 1 --arg "name"
  // --properties-file 配置文件
  while (!args.isEmpty) {
    // --num-workers, --worker-memory, and --worker-cores are deprecated since 1.0,
    // the properties with executor in their names are preferred.
    // 开始解析 类参数  case ("--jar") :: value :: tail 就是提取参数和参数名 ，并把剩余的参数放到 tail中
    args match {
      case ("--jar") :: value :: tail =>
        userJar = value
        args = tail
      case ("--class") :: value :: tail =>
        userClass = value
        args = tail
      case ("--primary-py-file") :: value :: tail =>
        primaryPyFile = value
        args = tail
      case ("--primary-r-file") :: value :: tail =>
        primaryRFile = value
        args = tail
      case ("--arg") :: value :: tail =>
        userArgsBuffer += value
        args = tail
      case ("--properties-file") :: value :: tail =>
        propertiesFile = value
        args = tail
      case _ =>
        printUsageAndExit(1, args)
    }
  }
  if (primaryPyFile != null && primaryRFile != null) {
    // scalastyle:off println
    System.err.println("Cannot have primary-py-file and primary-r-file at the same time")
    // scalastyle:on println
    System.exit(-1)
  }
  userArgs = userArgsBuffer.toList
}

这个方法对ApplicationMaster参数进行解析，通过方法中match…case判断代码，我们可以看出ApplicationMaster允许的参数只有 6 个，如果包含其他名称的参数则会异常退出，并且参数–primary-py-file 和参数–primary-r-file 不允许同时出现。对于上面的match…case的分析，见章节结尾部分。

printUsageAndExit

该方法用来将ApplicationMaster的使用参数信息进行打印。方法定义

def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
  // scalastyle:off println
  if (unknownParam != null) {
    System.err.println("Unknown/unsupported param " + unknownParam)
  }
  System.err.println("""
    |Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
    |Options:
    |  --jar JAR_PATH       Path to your application's JAR file
    |  --class CLASS_NAME   Name of your application's main class
    |  --primary-py-file    A main Python file
    |  --primary-r-file     A main R file
    |  --arg ARG            Argument to be passed to your application's main class.
    |                       Multiple invocations are possible, each will be passed in order.
    |  --properties-file FILE Path to a custom Spark properties file.
    """.stripMargin)
  // scalastyle:on println
  System.exit(exitCode)
}

问题分析

参数判断的match … case

首先看代码

args match {
    case ("--jar") :: value :: tail =>
      userJar = value
      args = tail
    case ("--class") :: value :: tail =>
      userClass = value
      args = tail

case中的信息其实就是匹配模式，这里，如“(“–jar”) :: value :: tail”，其实就是在args开头匹配 “–jar” 参数，也就是如果args中的第一个值为”–jar“，那么将args的第二个值赋值给value，最后将剩余的值放到 tail中，但是需要注意的是，这个模式是从args的第一个元素开始的，如果第二元素是“–jar”，是不符合条件的。