Streaming Annotated Newsletter

Share this post

Streaming Annotated Monthly – March 2021

streamingannotated.substack.com

Streaming Annotated Monthly – March 2021

Receive the last news and interesting articles about streaming platforms and processing frameworks in your mailbox

Antón (antonmry)
Mar 13, 2021
2
Share this post

Streaming Annotated Monthly – March 2021

streamingannotated.substack.com

Here we go another month with the Streaming Annotated Newsletter! I continue with my personal goal to find out how to make this more collaborative. The newsletter numbers are solid but I miss to have more interaction.

So let’s try something different: a Telegram group. I already participate in a couple of them (mainly Java related and StreamingHispano) and they are great. You can read from your phone and configure notifications by group. I will post there articles with 🌶️🌶️🌶️ and we can comment them or discuss whatever you want related to streaming.

Does it sound interesting? Join the Streaming Annotated Telegram group!


Architecture and design

  • Making Sense of Unbounded Data.

  • Why You Need To Set SLAs for Your Data Pipelines: unusual topic but very important. 🌶️🌶️

  • Scaling Reporting at Reddit. 🌶️

Kafka

  • Examining Apache Kafka Performance Metrics ft. Alok Nikhil: podcast.

  • Real-time monitoring of Formula 1 telemetry data on Kubernetes with Grafana, Apache Kafka, and Strimzi.

  • Announcing the Confluent Community Forum.

  • User authentication and authorization in Apache Kafka.

  • Docker free Kafka integration tests.

  • Introducing Confluent Platform 6.1.

  • Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1: there are several options for geographic redundancy, Multi-Region Clusters is one of commercial options. 🌶️

  • 42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka: one of the most exciting changes coming to Kafka this year.

  • Pragmatic Guide to Apache Kafka’s Exactly Once Semantics: excellent video from Gwen Shapira covering the typical gotchas and misunderstanding with Exaclty Once semantics in Kafka. 🌶️🌶️

  • Lessons Learned from Running Apache Kafka at Scale at Pinterest: highly recommended article. I liked the part covering Kafka upgrades and balancing of partitions.🌶️🌶️🌶️

  • Twitter thread about Kafka monitoring and observability. 🌶️

  • Microcks 1.2.0 release: it supports now Kafka and Avro. This guide Kafka, Avro and Schema Registry is available now. 🌶️

  • Visualizing Kafka: very good introductory article to Kafka.

  • Preview: Apache Kafka Log4j2 Support (KIP-653): this is a bit low-level but I appreciate to know more about logging in Kafka and CVE associated to it.

  • Kafka Monthly Digest – February 2021. 🌶️🌶️🌶️

  • Streaming microservices with ZIO and Kafka: I didn’t knew ZIO Streams. It’s interesting. 🌶️

Kafka on Kubernetes

  • How to integrate Kafka with Istio on OpenShift.

  • Getting Started with AsyncAPI Mocking on Minikube: video.

  • Sending Alerts From Strimzi to Slack.

Pulsar

  • Migrate to Serverless with Pulsar Functions.

  • StreamNative’s 2020 Year in Review: it’s vendor-specific but also a good overview of the Pulsar state-of-art. 🌶️

  • Pulsar Office Hour - 02/17/2021: video. I had the opportunity to participate with some questions. Great format and answers. 🌶️🌶️🌶️

  • TGIP 019: February Updates on Apache Pulsar: video.

  • Apache Kafka vs Apache Pulsar: video.

  • Intro to Apache Pulsar 101: video.

  • How to choose pulsar vs Kafka?.

Kafka Streams, Kafka Connect, etc.

  • How to create sliding windows.

  • Window Final Result with Loïc Divad: video.

  • How To Split a Stream of Events into Substreams: video.

  • Testing Kafka Streams applications using TopologyTestDriver: video. TopologyTestDriver is super useful when building Kafka Streams pipelines.

  • How to Write a Connector for Kafka Connect – Deep Dive into Configuration Handling: good tips here. 🌶️🌶️

  • An Overview About the Different Kafka Connect Plugins.

  • Kafka Connect management with GitOps.

  • Apache Kafka and SAP Integration with the Kafka Connect ODP Source Connector.

  • ksqlDB HOWTO - A mini video series: videos.

  • Announcing ksqlDB 0.15.

  • Keys in ksqlDB, Unlocked.

  • Loading delimited data into Kafka - quick & dirty (but effective).

Flink

  • How to natively deploy Flink on Kubernetes with High-Availability (HA): a lot of improvements of Flink over Kubernetes. 🌶️

  • Autoscaling Apache Flink with Ververica Platform Autopilot.

  • How to join streams in Apache Flink

  • Designing Next-Gen Event-Driven application powered by Stateful Functions. Part I: Stateful Functions is something which I would like to learn more about and see it working in a real case. This is an excellent introduction. 🌶️🌶️

  • Ten Flink Gotchas we wish we had known: this is a superb explanation and collection of lessons learnt with Flink. 🌶️🌶️🌶️

  • How to build and debug a Flink pipeline based in Event Time: self-hype 😇

Spark Structured Streaming

  • Using Cloud Storage for Checkpoint Location in Spark Structured Streaming on Google Kubernetes Engine: demo. 🌶️🌶️🌶️

  • Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure: two top SaaS working together.

  • Spark Release 3.0.2.

  • Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints: schema evolution is the next big thing for streaming. 🌶️🌶️

Apache Beam

  • Gotchas of Stream Processing: Data Skewness: good overview and tips, most of the content applies to other frameworks. 🌶️🌶️

  • Apache Beam 2.28.0.

  • Beam Collegue: free Apache Beam training! 🌶️🌶️🌶️

Change Data Capture (CDC)

  • STREAMS Explained: Snowflake Change Data Capture Using Streams with a Snowpipe (Revised): it’s very interesting what snowflake is building. It seems big news are coming.

  • Change Data Capture in Postgres With Debezium.

  • A Gentle Introduction to Event-driven Change Data Capture.

  • Debezium 1.5.0.Alpha1 Released.

  • Oracle CDC Source Premium Connector is Now Generally Available.

  • Oracle to Kafka — Playing with Confluent’s new Oracle CDC Source Connector in Docker.

  • From Oracle to Google Big Query by Kafka.

  • https://www.infoq.com/articles/saga-orchestration-outbox/: another great post by Gunnar Morling. His Java articles covering new features are also superb. 🌶️🌶️🌶️

Google Cloud

  • Orchestration with Workflows: codelab with Pub/Sub.

  • Introducing real-time data integration for BigQuery with Cloud Data Fusion.

  • Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS.

  • We had an incident, and it was great: article describing the experience with a “poison pills” incident on Dataflow. 🌶️🌶️🌶️

  • How Spotify Optimized the Largest Dataflow Job Ever for Wrapped 2020: post with this technical level are highly appreciated. In this case, it covers Sort Merge Bucket, an optimization that reduces shuffle by doing work up front on the producer side.🌶️🌶️🌶️

  • Announcing the Launch of Databricks on Google Cloud.

  • Google Cloud Functions Sink Connector for Confluent Cloud.

  • Monitoring your Dataflow pipelines: an overview: it’s always good to see metrics and dashboard for data pipelines. 🌶️🌶️

  • An Apache Spark connector is now available for Pub/Sub Lite.

  • Architect your data lake on Google Cloud with Data Fusion and Composer.

AWS

  • AWS re:Invent 2020: How Disney+ uses fast data ubiquity to improve the customer experience: video.

  • How to setup Kafka cluster for 15K events per second on AWS using Docker. 🌶️

  • Run Spark Applications on AWS Fargate.

Azure

  • Automatically forwarding Azure Monitor Autoscale events to Azure Event Grid.

  • Event Driven Databricks ETL with Azure Data Factory.

RocksDB

It isn’t exactly streaming but it’s quite relevant for Flink, Kafka Streams, etc.

  • The effect of switching to TCMalloc on RocksDB memory use. 🌶️

  • Why RocksDB rocks?.

Tools

  • Streams Explorer: it allows examining Apache Kafka data pipelines.

  • 3rd party command line tools for Apache Kafka: this is a solid compilation of Kafka CLI tools.

  • Kafka Connect FileSystem Connector: amazing work by Mario Molina.

  • klustr: An open source monitoring tool for Apache Kafka.

  • kafka-connect-transform-kryptonite: Kafka Connect SMT to do field-level encryption/decryption of records.


That’s all! Comments? Drop me a message in Twitter or join the Streaming Annotated Telegram group!

Share this post

Streaming Annotated Monthly – March 2021

streamingannotated.substack.com
Comments
TopNew

No posts

Ready for more?

© 2023 Anton Rodriguez - antonmry
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing