2023-10-29 12:08:56 -04:00
[![Tests ](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/tests )](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)
[![Integration tests ](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/integration-tests )](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)
[![Build ](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/build )](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)
2023-07-22 10:42:54 -04:00
# Apache Beam Examples
This project holds all the examples of apache beam that are detailed on my blog at [https://barrelsofdata.com ](https://barrelsofdata.com )
## Build instructions
2023-12-15 02:10:50 -05:00
This is a multi-module project and uses version catalogs. The dependencies can we viewed at [./gradle/libs.versions.toml ](./gradle/libs.versions.toml )
2023-07-22 10:42:54 -04:00
To build project, execute the following commands from the project root
- To clear all compiled classes, build and log directories
```shell script
./gradlew clean
```
- To run unit tests
```shell script
./gradlew test
```
- To run integration tests
```shell script
./gradlew integrationTest
```
- To build jar
```shell script
./gradlew build
```
## Run
Ensure you have kafka running ([kafka tutorial](https://barrelsofdata.com/apache-kafka-setup)) and a postgres sql database running.
- Create table
```sql
CREATE TABLE < TABLE_NAME > (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);
```
- Start streaming job on direct runner
```shell script
java -jar streaming/build/libs/streaming.jar \
--runner=DirectRunner \
--kafkaBrokers=< KAFKA_BROKER:PORT_COMMA_SEPARATED > \
--kafkaTopic=< KAFKA_TOPIC > \
--kafkaConsumerGroupId=< KAFKA_CONSUMER_GROUP > \
--sqlDriver=org.postgresql.Driver \
--jdbcUrl=jdbc:postgresql://< HOST [ :PORT ] > /< DATABASE > \
--table=< TABLE_NAME > \
--username=< USERNAME > \
--password=< PASSWORD > \
--resetToEarliest
```
- You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema
```json
{"ts":< EPOCH_MILLIS > ,"value":< DOUBLE_VALUE > }
```