Apache spark quick standalone installation guide for windows

Photo by Franki Chamaki on Unsplash

In this article, we will go through steps to install Apache Spark on windows and test a word counting code on Intellij/Eclipse using maven dependencies. The steps are as follows:

  1. Download spark 2.3.4 bin hadoop2.7.tgz (Spark 2.3.4) from this link
    This is pre-built for Hadoop 2.7, but don’t install Hadoop (we will use Spark in standalone cluster mode)!
  2. Create a new directory C:\Spark\ on partition C:
  3. Unpack the downloaded archive file and move all the files and subdirectories under the root directory of the archive into C:\Spark\ (so that README.md, bin\, examples\, etc are directly under C:\Spark\
  4. Create a new directory C:\winutils and a subdirectory C:\winutils\bin
  5. Download winutils.exe from this link (Right-click this link)
  6. Move winutils.exe into C:\winutils\bin
  7. Set HADOOP_HOME as C:\winutils\ in system environment variables. Remember “System variables” (not “User variables”!). Then restart your computer.
  8. Also, make sure your JAVA_HOME exists and contains path to existing JDK8 (e.g. C:\Program Files\Java\jdk-8). It must point to JDK and not JRE.
  9. Open your window shell and change the directory to C:\Spark
  10. Enter .\bin\run-example SparkPi 10 . To test your Spark installation. Somewhere in the output, you should see the line “Pi is roughly 3.14…”.
  11. Open IntelliJ/Eclipse and add maven dependencies as follow in your pom.xml
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.1</version>
</dependency>
</dependencies>

12. Now create a class WordCount.java in your src/main/java of your project. Also specify correct location of your file which you want to count words for.

References: Spark Installation guide by Dr. Matthias Nickles

--

--

Prakhar Gurawa

Data Scientist@ KPMG Applied Intelligence, Dublin| Learner | Caricaturist | Omnivorous | DC Fanboy