In this blog we will see how to create a standalone java application that will run on spark cluster, we will also learn how to use spark-submit tool
We will be loading a text file from given path and counting the number of records. You can create a RDD by parallelizing a collection or loading any external file. You can also create an RDD by transforming an existing RDD.
SparkConf conf=new SparkConf().setAppName("RDDCreationExample");
JavaSparkContext sc=new JavaSparkContext(conf);
//create RDD from an array
System.out.println("Loading text file from "+args);
System.out.println("parNames Count "+parNames.count());
System.out.println("namesFromFile Count "+namesFromFile.count());
//close the context ?
you can get the pom file from here.
To submit the jar run the following command
spark-submit --class net.icircuit.spark.RDDCreation.App RDDCreation-0.0.1-SNAPSHOT.jar file:///home/anshumanthesniper7722/test.txt
by default spark runs the job in local mode, to run the job on yarn you need add –master flag
spark-submit --master yarn RDDCreation-0.0.1-SNAPSHOT.jar /path/to/file/on/hdfs
spark-submit is a single tool to submit jobs to the spark cluster
if you run a script or jar file without any options spark will run the job locally
to run the job on cluster , you need add –master flag
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
–class: The entry point for your application
–master: The master URL for the cluster (e.g. spark://18.104.22.168:7077 or just yarn incase you want to use yarn)
–deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)
–conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any
for more information on spark-submit tool check this page.