49 lines
1.8 KiB
Markdown
49 lines
1.8 KiB
Markdown
|
# Apache Beam Examples
|
||
|
This project holds all the examples of apache beam that are detailed on my blog at [https://barrelsofdata.com](https://barrelsofdata.com)
|
||
|
|
||
|
## Build instructions
|
||
|
This is a multi-module project and uses version catalogs. The dependencies can we viewed at [/barrelsofdata/apache-beam-examples/src/branch/main/gradle/libs.versions.toml](./gradle/libs.versions.toml)
|
||
|
|
||
|
To build project, execute the following commands from the project root
|
||
|
- To clear all compiled classes, build and log directories
|
||
|
```shell script
|
||
|
./gradlew clean
|
||
|
```
|
||
|
- To run unit tests
|
||
|
```shell script
|
||
|
./gradlew test
|
||
|
```
|
||
|
- To run integration tests
|
||
|
```shell script
|
||
|
./gradlew integrationTest
|
||
|
```
|
||
|
- To build jar
|
||
|
```shell script
|
||
|
./gradlew build
|
||
|
```
|
||
|
|
||
|
## Run
|
||
|
Ensure you have kafka running ([kafka tutorial](https://barrelsofdata.com/apache-kafka-setup)) and a postgres sql database running.
|
||
|
|
||
|
- Create table
|
||
|
```sql
|
||
|
CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);
|
||
|
```
|
||
|
- Start streaming job on direct runner
|
||
|
```shell script
|
||
|
java -jar streaming/build/libs/streaming.jar \
|
||
|
--runner=DirectRunner \
|
||
|
--kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \
|
||
|
--kafkaTopic=<KAFKA_TOPIC> \
|
||
|
--kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \
|
||
|
--sqlDriver=org.postgresql.Driver \
|
||
|
--jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \
|
||
|
--table=<TABLE_NAME> \
|
||
|
--username=<USERNAME> \
|
||
|
--password=<PASSWORD> \
|
||
|
--resetToEarliest
|
||
|
```
|
||
|
- You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema
|
||
|
```json
|
||
|
{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}
|
||
|
```
|