Apache spark quick standalone installation guide for windows
Nov 4, 2020
In this article, we will go through steps to install Apache Spark on windows and test a word counting code on Intellij/Eclipse using maven dependencies. The steps are as follows:
- Download spark 2.3.4 bin hadoop2.7.tgz (Spark 2.3.4) from this link
This is pre-built for Hadoop 2.7, but don’t install Hadoop (we will use Spark in standalone cluster mode)! - Create a new directory C:\Spark\ on partition C:
- Unpack the downloaded archive file and move all the files and subdirectories under the root directory of the archive into C:\Spark\ (so that README.md, bin\, examples\, etc are directly under C:\Spark\
- Create a new directory C:\winutils and a subdirectory C:\winutils\bin
- Download winutils.exe from this link (Right-click this link)
- Move winutils.exe into C:\winutils\bin
- Set HADOOP_HOME as C:\winutils\ in system environment variables. Remember “System variables” (not “User variables”!). Then restart your computer.
- Also, make sure your JAVA_HOME exists and contains path to existing JDK8 (e.g. C:\Program Files\Java\jdk-8). It must point to JDK and not JRE.
- Open your window shell and change the directory to C:\Spark
- Enter .\bin\run-example SparkPi 10 . To test your Spark installation. Somewhere in the output, you should see the line “Pi is roughly 3.14…”.
- Open IntelliJ/Eclipse and add maven dependencies as follow in your pom.xml
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.1</version>
</dependency>
</dependencies>
12. Now create a class WordCount.java in your src/main/java of your project. Also specify correct location of your file which you want to count words for.
References: Spark Installation guide by Dr. Matthias Nickles