Streaming Annotated Monthly – March 2021
Receive the last news and interesting articles about streaming platforms and processing frameworks in your mailbox
Here we go another month with the Streaming Annotated Newsletter! I continue with my personal goal to find out how to make this more collaborative. The newsletter numbers are solid but I miss to have more interaction.
So let’s try something different: a Telegram group. I already participate in a couple of them (mainly Java related and StreamingHispano) and they are great. You can read from your phone and configure notifications by group. I will post there articles with 🌶️🌶️🌶️ and we can comment them or discuss whatever you want related to streaming.
Does it sound interesting? Join the Streaming Annotated Telegram group!
Architecture and design
Why You Need To Set SLAs for Your Data Pipelines: unusual topic but very important. 🌶️🌶️
Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1: there are several options for geographic redundancy, Multi-Region Clusters is one of commercial options. 🌶️
42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka: one of the most exciting changes coming to Kafka this year.
Pragmatic Guide to Apache Kafka’s Exactly Once Semantics: excellent video from Gwen Shapira covering the typical gotchas and misunderstanding with Exaclty Once semantics in Kafka. 🌶️🌶️
Lessons Learned from Running Apache Kafka at Scale at Pinterest: highly recommended article. I liked the part covering Kafka upgrades and balancing of partitions.🌶️🌶️🌶️
Visualizing Kafka: very good introductory article to Kafka.
Preview: Apache Kafka Log4j2 Support (KIP-653): this is a bit low-level but I appreciate to know more about logging in Kafka and CVE associated to it.
Streaming microservices with ZIO and Kafka: I didn’t knew ZIO Streams. It’s interesting. 🌶️
Kafka on Kubernetes
StreamNative’s 2020 Year in Review: it’s vendor-specific but also a good overview of the Pulsar state-of-art. 🌶️
Pulsar Office Hour - 02/17/2021: video. I had the opportunity to participate with some questions. Great format and answers. 🌶️🌶️🌶️
Apache Kafka vs Apache Pulsar: video.
Intro to Apache Pulsar 101: video.
Kafka Streams, Kafka Connect, etc.
Testing Kafka Streams applications using TopologyTestDriver: video. TopologyTestDriver is super useful when building Kafka Streams pipelines.
How to Write a Connector for Kafka Connect – Deep Dive into Configuration Handling: good tips here. 🌶️🌶️
ksqlDB HOWTO - A mini video series: videos.
How to natively deploy Flink on Kubernetes with High-Availability (HA): a lot of improvements of Flink over Kubernetes. 🌶️
Designing Next-Gen Event-Driven application powered by Stateful Functions. Part I: Stateful Functions is something which I would like to learn more about and see it working in a real case. This is an excellent introduction. 🌶️🌶️
Ten Flink Gotchas we wish we had known: this is a superb explanation and collection of lessons learnt with Flink. 🌶️🌶️🌶️
Spark Structured Streaming
Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure: two top SaaS working together.
Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints: schema evolution is the next big thing for streaming. 🌶️🌶️
Gotchas of Stream Processing: Data Skewness: good overview and tips, most of the content applies to other frameworks. 🌶️🌶️
Beam Collegue: free Apache Beam training! 🌶️🌶️🌶️
Change Data Capture (CDC)
STREAMS Explained: Snowflake Change Data Capture Using Streams with a Snowpipe (Revised): it’s very interesting what snowflake is building. It seems big news are coming.
https://www.infoq.com/articles/saga-orchestration-outbox/: another great post by Gunnar Morling. His Java articles covering new features are also superb. 🌶️🌶️🌶️
Orchestration with Workflows: codelab with Pub/Sub.
We had an incident, and it was great: article describing the experience with a “poison pills” incident on Dataflow. 🌶️🌶️🌶️
How Spotify Optimized the Largest Dataflow Job Ever for Wrapped 2020: post with this technical level are highly appreciated. In this case, it covers Sort Merge Bucket, an optimization that reduces shuffle by doing work up front on the producer side.🌶️🌶️🌶️
Monitoring your Dataflow pipelines: an overview: it’s always good to see metrics and dashboard for data pipelines. 🌶️🌶️
It isn’t exactly streaming but it’s quite relevant for Flink, Kafka Streams, etc.
Streams Explorer: it allows examining Apache Kafka data pipelines.
3rd party command line tools for Apache Kafka: this is a solid compilation of Kafka CLI tools.
klustr: An open source monitoring tool for Apache Kafka.
kafka-connect-transform-kryptonite: Kafka Connect SMT to do field-level encryption/decryption of records.