Project detailing how to write a word count program in spark along with unit test cases
Go to file
2020-06-21 10:07:17 -04:00
gradle/wrapper Initial commit 2020-06-21 14:03:19 +00:00
src Initial commit 2020-06-21 10:07:17 -04:00
.gitignore Initial commit 2020-06-21 14:03:19 +00:00
build.gradle Initial commit 2020-06-21 14:03:19 +00:00
gradle.properties Initial commit 2020-06-21 14:03:19 +00:00
gradlew Initial commit 2020-06-21 14:03:19 +00:00
gradlew.bat Initial commit 2020-06-21 14:03:19 +00:00
README.md Initial commit 2020-06-21 10:07:17 -04:00
settings.gradle Initial commit 2020-06-21 10:07:17 -04:00

Spark Word Count with Unit Tests

This is a project detailing how to write word count program in Apache Spark along with unit test cases. The related blog post can be found at https://www.barrelsofdata.com/spark-word-count-with-unit-tests

Build instructions

From the root of the project execute the below commands

  • To clear all compiled classes, build and log directories
./gradlew clean
  • To run tests
./gradlew test
  • To build jar
./gradlew shadowJar
  • All combined
./gradlew clean test shadowJar

Run

spark-submit --master yarn --deploy-mode cluster build/libs/spark-wordcount-1.0.jar hdfs://path/to/input/file.txt hdfs://path/to/output/directory