Examples of apache beam projects and testing
Go to file
2023-07-22 20:12:54 +05:30
buildSrc Add sensor pipeline 2023-07-22 20:12:54 +05:30
gradle Add sensor pipeline 2023-07-22 20:12:54 +05:30
streaming Add sensor pipeline 2023-07-22 20:12:54 +05:30
.gitignore Add sensor pipeline 2023-07-22 20:12:54 +05:30
build.gradle.kts Add sensor pipeline 2023-07-22 20:12:54 +05:30
gradle.properties Add sensor pipeline 2023-07-22 20:12:54 +05:30
gradlew Add sensor pipeline 2023-07-22 20:12:54 +05:30
gradlew.bat Add sensor pipeline 2023-07-22 20:12:54 +05:30
README.md Add sensor pipeline 2023-07-22 20:12:54 +05:30
settings.gradle.kts Add sensor pipeline 2023-07-22 20:12:54 +05:30

Apache Beam Examples

This project holds all the examples of apache beam that are detailed on my blog at https://barrelsofdata.com

Build instructions

This is a multi-module project and uses version catalogs. The dependencies can we viewed at /barrelsofdata/apache-beam-examples/src/branch/main/gradle/libs.versions.toml

To build project, execute the following commands from the project root

  • To clear all compiled classes, build and log directories
./gradlew clean
  • To run unit tests
./gradlew test
  • To run integration tests
./gradlew integrationTest
  • To build jar
./gradlew build

Run

Ensure you have kafka running (kafka tutorial) and a postgres sql database running.

  • Create table
CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);
  • Start streaming job on direct runner
java -jar streaming/build/libs/streaming.jar \
                --runner=DirectRunner \
                --kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \
                --kafkaTopic=<KAFKA_TOPIC> \
                --kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \
                --sqlDriver=org.postgresql.Driver \
                --jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \
                --table=<TABLE_NAME> \
                --username=<USERNAME> \
                --password=<PASSWORD> \
                --resetToEarliest
  • You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema
{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}