Streaming Annotated Monthly – March 2021
Receive the last news and interesting articles about streaming platforms and processing frameworks in your mailbox
Here we go another month with the Streaming Annotated Newsletter! I continue with my personal goal to find out how to make this more collaborative. The newsletter numbers are solid but I miss to have more interaction.
So let’s try something different: a Telegram group. I already participate in a couple of them (mainly Java related and StreamingHispano) and they are great. You can read from your phone and configure notifications by group. I will post there articles with 🌶️🌶️🌶️ and we can comment them or discuss whatever you want related to streaming.
Does it sound interesting? Join the Streaming Annotated Telegram group!
Architecture and design
Why You Need To Set SLAs for Your Data Pipelines: unusual topic but very important. 🌶️🌶️
Kafka
Examining Apache Kafka Performance Metrics ft. Alok Nikhil: podcast.
Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1: there are several options for geographic redundancy, Multi-Region Clusters is one of commercial options. 🌶️
42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka: one of the most exciting changes coming to Kafka this year.
Pragmatic Guide to Apache Kafka’s Exactly Once Semantics: excellent video from Gwen Shapira covering the typical gotchas and misunderstanding with Exaclty Once semantics in Kafka. 🌶️🌶️
Lessons Learned from Running Apache Kafka at Scale at Pinterest: highly recommended article. I liked the part covering Kafka upgrades and balancing of partitions.🌶️🌶️🌶️
Microcks 1.2.0 release: it supports now Kafka and Avro. This guide Kafka, Avro and Schema Registry is available now. 🌶️
Visualizing Kafka: very good introductory article to Kafka.
Preview: Apache Kafka Log4j2 Support (KIP-653): this is a bit low-level but I appreciate to know more about logging in Kafka and CVE associated to it.
Streaming microservices with ZIO and Kafka: I didn’t knew ZIO Streams. It’s interesting. 🌶️
Kafka on Kubernetes
Pulsar
StreamNative’s 2020 Year in Review: it’s vendor-specific but also a good overview of the Pulsar state-of-art. 🌶️
Pulsar Office Hour - 02/17/2021: video. I had the opportunity to participate with some questions. Great format and answers. 🌶️🌶️🌶️
Apache Kafka vs Apache Pulsar: video.
Intro to Apache Pulsar 101: video.
Kafka Streams, Kafka Connect, etc.
Testing Kafka Streams applications using TopologyTestDriver: video. TopologyTestDriver is super useful when building Kafka Streams pipelines.
How to Write a Connector for Kafka Connect – Deep Dive into Configuration Handling: good tips here. 🌶️🌶️
Apache Kafka and SAP Integration with the Kafka Connect ODP Source Connector.
ksqlDB HOWTO - A mini video series: videos.
Loading delimited data into Kafka - quick & dirty (but effective).
Flink
How to natively deploy Flink on Kubernetes with High-Availability (HA): a lot of improvements of Flink over Kubernetes. 🌶️
Designing Next-Gen Event-Driven application powered by Stateful Functions. Part I: Stateful Functions is something which I would like to learn more about and see it working in a real case. This is an excellent introduction. 🌶️🌶️
Ten Flink Gotchas we wish we had known: this is a superb explanation and collection of lessons learnt with Flink. 🌶️🌶️🌶️
How to build and debug a Flink pipeline based in Event Time: self-hype 😇
Spark Structured Streaming
Using Cloud Storage for Checkpoint Location in Spark Structured Streaming on Google Kubernetes Engine: demo. 🌶️🌶️🌶️
Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure: two top SaaS working together.
Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints: schema evolution is the next big thing for streaming. 🌶️🌶️
Apache Beam
Gotchas of Stream Processing: Data Skewness: good overview and tips, most of the content applies to other frameworks. 🌶️🌶️
Beam Collegue: free Apache Beam training! 🌶️🌶️🌶️
Change Data Capture (CDC)
STREAMS Explained: Snowflake Change Data Capture Using Streams with a Snowpipe (Revised): it’s very interesting what snowflake is building. It seems big news are coming.
Oracle CDC Source Premium Connector is Now Generally Available.
Oracle to Kafka — Playing with Confluent’s new Oracle CDC Source Connector in Docker.
https://www.infoq.com/articles/saga-orchestration-outbox/: another great post by Gunnar Morling. His Java articles covering new features are also superb. 🌶️🌶️🌶️
Google Cloud
Orchestration with Workflows: codelab with Pub/Sub.
Introducing real-time data integration for BigQuery with Cloud Data Fusion.
Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS.
We had an incident, and it was great: article describing the experience with a “poison pills” incident on Dataflow. 🌶️🌶️🌶️
How Spotify Optimized the Largest Dataflow Job Ever for Wrapped 2020: post with this technical level are highly appreciated. In this case, it covers Sort Merge Bucket, an optimization that reduces shuffle by doing work up front on the producer side.🌶️🌶️🌶️
Monitoring your Dataflow pipelines: an overview: it’s always good to see metrics and dashboard for data pipelines. 🌶️🌶️
An Apache Spark connector is now available for Pub/Sub Lite.
Architect your data lake on Google Cloud with Data Fusion and Composer.
AWS
AWS re:Invent 2020: How Disney+ uses fast data ubiquity to improve the customer experience: video.
How to setup Kafka cluster for 15K events per second on AWS using Docker. 🌶️
Azure
RocksDB
It isn’t exactly streaming but it’s quite relevant for Flink, Kafka Streams, etc.
Tools
Streams Explorer: it allows examining Apache Kafka data pipelines.
3rd party command line tools for Apache Kafka: this is a solid compilation of Kafka CLI tools.
Kafka Connect FileSystem Connector: amazing work by Mario Molina.
klustr: An open source monitoring tool for Apache Kafka.
kafka-connect-transform-kryptonite: Kafka Connect SMT to do field-level encryption/decryption of records.
That’s all! Comments? Drop me a message in Twitter or join the Streaming Annotated Telegram group!