spark-wordcount/README.md

25 lines
1.1 KiB
Markdown
Raw Permalink Normal View History

2023-10-07 07:10:15 -04:00
[![Tests](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/spark-wordcount/tests)](https://git.barrelsofdata.com/barrelsofdata/spark-wordcount/actions?workflow=workflow.yaml)
[![Build](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/spark-wordcount/build)](https://git.barrelsofdata.com/barrelsofdata/spark-wordcount/actions?workflow=workflow.yaml)
2020-06-21 10:07:17 -04:00
# Spark Word Count with Unit Tests
2023-10-07 07:10:15 -04:00
This is a project detailing how to write word count program in Apache Spark along with unit test cases. The related blog post can be found at [https://barrelsofdata.com/spark-word-count-with-unit-tests](https://barrelsofdata.com/spark-word-count-with-unit-tests)
2020-06-21 10:03:19 -04:00
## Build instructions
From the root of the project execute the below commands
- To clear all compiled classes, build and log directories
```shell script
./gradlew clean
```
- To run tests
```shell script
./gradlew test
```
- To build jar
```shell script
2023-10-07 07:10:15 -04:00
./gradlew build
2020-06-21 10:03:19 -04:00
```
## Run
```shell script
2023-10-07 07:10:15 -04:00
spark-submit --master yarn --deploy-mode cluster build/libs/spark-wordcount-1.0.0.jar hdfs://path/to/input/file.txt hdfs://path/to/output/directory
2020-06-21 10:03:19 -04:00
```