apache-beam-examples/README.md

[![Tests](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/tests)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)
[![Integration tests](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/integration-tests)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)
[![Build](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/build)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)

# Apache Beam Examples
This project holds all the examples of apache beam that are detailed on my blog at [https://barrelsofdata.com](https://barrelsofdata.com)

## Build instructions
This is a multi-module project and uses version catalogs. The dependencies can we viewed at [./gradle/libs.versions.toml](./gradle/libs.versions.toml)

To build project, execute the following commands from the project root
- To clear all compiled classes, build and log directories
```shell script
./gradlew clean
```
- To run unit tests
```shell script
./gradlew test
```
- To run integration tests
```shell script
./gradlew integrationTest
```
- To build jar
```shell script
./gradlew build
```

## Run
Ensure you have kafka running ([kafka tutorial](https://barrelsofdata.com/apache-kafka-setup)) and a postgres sql database running.

- Create table
```sql
CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);
```
- Start streaming job on direct runner
```shell script
java -jar streaming/build/libs/streaming.jar \
                --runner=DirectRunner \
                --kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \
                --kafkaTopic=<KAFKA_TOPIC> \
                --kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \
                --sqlDriver=org.postgresql.Driver \
                --jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \
                --table=<TABLE_NAME> \
                --username=<USERNAME> \
                --password=<PASSWORD> \
                --resetToEarliest
```
- You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema
```json
{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}
```
Update versions and add gitea workflow 2023-10-29 12:08:56 -04:00			`[![Tests](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/tests)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)`
			`[![Integration tests](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/integration-tests)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)`
			`[![Build](https://barrelsofdata.com/api/v1/git/action/status/fetch/barrelsofdata/apache-beam-examples/build)](https://git.barrelsofdata.com/barrelsofdata/apache-beam-examples/actions?workflow=workflow.yaml)`

Add sensor pipeline 2023-07-22 10:42:54 -04:00			`# Apache Beam Examples`
			`This project holds all the examples of apache beam that are detailed on my blog at [https://barrelsofdata.com](https://barrelsofdata.com)`

			`## Build instructions`
Update versions and switch to java 21 2023-12-15 02:10:50 -05:00			`This is a multi-module project and uses version catalogs. The dependencies can we viewed at [./gradle/libs.versions.toml](./gradle/libs.versions.toml)`
Add sensor pipeline 2023-07-22 10:42:54 -04:00
			`To build project, execute the following commands from the project root`
			`- To clear all compiled classes, build and log directories`
			```shell script
			`./gradlew clean`
			```
			`- To run unit tests`
			```shell script
			`./gradlew test`
			```
			`- To run integration tests`
			```shell script
			`./gradlew integrationTest`
			```
			`- To build jar`
			```shell script
			`./gradlew build`
			```

			`## Run`
			`Ensure you have kafka running ([kafka tutorial](https://barrelsofdata.com/apache-kafka-setup)) and a postgres sql database running.`

			`- Create table`
			```sql
			`CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);`
			```
			`- Start streaming job on direct runner`
			```shell script
			`java -jar streaming/build/libs/streaming.jar \`
			`--runner=DirectRunner \`
			`--kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \`
			`--kafkaTopic=<KAFKA_TOPIC> \`
			`--kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \`
			`--sqlDriver=org.postgresql.Driver \`
			`--jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \`
			`--table=<TABLE_NAME> \`
			`--username=<USERNAME> \`
			`--password=<PASSWORD> \`
			`--resetToEarliest`
			```
			`- You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema`
			```json
			`{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}`
			```