Apache spark quick standalone installation guide for windows

Nov 4, 2020

In this article, we will go through steps to install Apache Spark on windows and test a word counting code on Intellij/Eclipse using maven dependencies. The steps are as follows:

Download spark 2.3.4 bin hadoop2.7.tgz (Spark 2.3.4) from this link
This is pre-built for Hadoop 2.7, but don’t install Hadoop (we will use Spark in standalone cluster mode)!
Create a new directory C:\Spark\ on partition C:
Unpack the downloaded archive file and move all the files and subdirectories under the root directory of the archive into C:\Spark\ (so that README.md, bin\, examples\, etc are directly under C:\Spark\
Create a new directory C:\winutils and a subdirectory C:\winutils\bin
Download winutils.exe from this link (Right-click this link)
Move winutils.exe into C:\winutils\bin
Set HADOOP_HOME as C:\winutils\ in system environment variables. Remember “System variables” (not “User variables”!). Then restart your computer.
Also, make sure your JAVA_HOME exists and contains path to existing JDK8 (e.g. C:\Program Files\Java\jdk-8). It must point to JDK and not JRE.
Open your window shell and change the directory to C:\Spark
Enter .\bin\run-example SparkPi 10 . To test your Spark installation. Somewhere in the output, you should see the line “Pi is roughly 3.14…”.
Open IntelliJ/Eclipse and add maven dependencies as follow in your pom.xml

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.0.1</version>
    </dependency>
</dependencies>

12. Now create a class WordCount.java in your src/main/java of your project. Also specify correct location of your file which you want to count words for.

References: Spark Installation guide by Dr. Matthias Nickles

Apache spark quick standalone installation guide for windows

Written by Prakhar Gurawa