Examples of apache beam projects and testing
.gitea/workflows | ||
buildSrc | ||
gradle | ||
streaming | ||
.gitignore | ||
build.gradle.kts | ||
gradle.properties | ||
gradlew | ||
gradlew.bat | ||
README.md | ||
settings.gradle.kts |
Apache Beam Examples
This project holds all the examples of apache beam that are detailed on my blog at https://barrelsofdata.com
Build instructions
This is a multi-module project and uses version catalogs. The dependencies can we viewed at ./gradle/libs.versions.toml
To build project, execute the following commands from the project root
- To clear all compiled classes, build and log directories
./gradlew clean
- To run unit tests
./gradlew test
- To run integration tests
./gradlew integrationTest
- To build jar
./gradlew build
Run
Ensure you have kafka running (kafka tutorial) and a postgres sql database running.
- Create table
CREATE TABLE <TABLE_NAME> (id VARCHAR(255), ts TIMESTAMP, computed DOUBLE);
- Start streaming job on direct runner
java -jar streaming/build/libs/streaming.jar \
--runner=DirectRunner \
--kafkaBrokers=<KAFKA_BROKER:PORT_COMMA_SEPARATED> \
--kafkaTopic=<KAFKA_TOPIC> \
--kafkaConsumerGroupId=<KAFKA_CONSUMER_GROUP> \
--sqlDriver=org.postgresql.Driver \
--jdbcUrl=jdbc:postgresql://<HOST[:PORT]>/<DATABASE> \
--table=<TABLE_NAME> \
--username=<USERNAME> \
--password=<PASSWORD> \
--resetToEarliest
- You can feed simulated data to the kafka topic with the key being a string (used as unique id of a sensor) and value being a json string of the following schema
{"ts":<EPOCH_MILLIS>,"value":<DOUBLE_VALUE>}