barrelsofdata/apache-beam-examples

Fork 0

Go to file

karthik 2d328743ff

Tests / reset-status (push) Successful in 3s

Details

Tests / tests (push) Successful in 8m42s

Details

Tests / integration-tests (push) Successful in 6m34s

Details

Tests / build (push) Successful in 8m25s

Details

Update versions and switch to java 21

2023-12-15 08:10:50 +01:00

.gitea/workflows

Update versions and switch to java 21

2023-12-15 08:10:50 +01:00

buildSrc

Add sensor pipeline

2023-07-22 20:12:54 +05:30

gradle

Update versions and switch to java 21

2023-12-15 08:10:50 +01:00

streaming

Update versions and switch to java 21

2023-12-15 08:10:50 +01:00

.gitignore

Add sensor pipeline

2023-07-22 20:12:54 +05:30

build.gradle.kts

Add sensor pipeline

2023-07-22 20:12:54 +05:30

gradle.properties

Add sensor pipeline

2023-07-22 20:12:54 +05:30

gradlew

Update versions and add gitea workflow

2023-10-29 17:08:56 +01:00

gradlew.bat

Add sensor pipeline

2023-07-22 20:12:54 +05:30

README.md

Update versions and switch to java 21

2023-12-15 08:10:50 +01:00

settings.gradle.kts

Add sensor pipeline

2023-07-22 20:12:54 +05:30

README.md

Apache Beam Examples

This project holds all the examples of apache beam that are detailed on my blog at https://barrelsofdata.com

Build instructions

This is a multi-module project and uses version catalogs. The dependencies can we viewed at ./gradle/libs.versions.toml

To build project, execute the following commands from the project root

To clear all compiled classes, build and log directories

./gradlew clean

To run unit tests

./gradlew test

To run integration tests

./gradlew integrationTest

To build jar

./gradlew build

Run

Ensure you have kafka running (kafka tutorial) and a postgres sql database running.

Create table

CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);

Start streaming job on direct runner

java -jar streaming/build/libs/streaming.jar \
                --runner=DirectRunner \
                --kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \
                --kafkaTopic=<KAFKA_TOPIC> \
                --kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \
                --sqlDriver=org.postgresql.Driver \
                --jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \
                --table=<TABLE_NAME> \
                --username=<USERNAME> \
                --password=<PASSWORD> \
                --resetToEarliest

You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema

{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}