Streaming Annotated Monthly – May 2021

Open community to discuss the last news and articles about streaming platforms and processing frameworks

May 20, 2021

Here we go with another newsletter! 70 new articles, 7 tools and a new surprise. From April to May we have a lot of conferences, starting with the Kafka Summit (don’t miss the Kafka Summit Europe 2021 Recap) in April, Data + AI Summit (Spark) in May and the Pulsar Summit in June. There are also a lot of conferences for cloud & software. Do we need another event?

Yes!

One of the comments in my last article was about getting together to share horror stories about Kafka deployments/outages so let’s do that! My idea is to use a Telegram Audio Chat (similar to Clubhouse) in the Telegram group but I’m open to other ideas.

If you have interest, sign up here!

Design and architecture

What is a streaming database?
Powering Messaging Enabledness with Yelp's Data Infrastructure
Confluent and Elastic Partner to Deliver Optimized Search and Real-Time Analytics: it’s interesting to see OSS companies working together, in this case, Elastic and Confluent.
Actor Model and Event Sourcing
Event Streams Are Nothing Without Action: I didn’t know Camunda.
Events should be as small as possible, right?. Right.
Internal consistency in streaming systems: this is a good deep dive. I would be cautious with the conclusions, the author isn’t an expert in each framework analysed but he made a great research 🌶️🌶️🌶️
Hybrid Multi-Cloud Event Mesh Architectural Design: we are going to see more and more articles about “Event Mesh” (including my talk in the Kafka Summit 😇).
How Airbnb Achieved Metric Consistency at Scale: I’m biased here (observability!) but this is a great article to understand the challenges of process metrics in big corps 🌶️🌶️🌶️
Adopting RocksDB within Manhattan: it’s always interesting to see RocksDB optimization, even if it isn’t related to streaming, and this article is a great source by Twitter 🌶️🌶️🌶️
4 Business Benefits of an Event-Driven Architecture (EDA)
Data Platform as a Service: build platforms is something more than the technical challenge, Miguel provided a lot of insights based in this experience 🌶️🌶️
Achieving Insights and Savings with Cost Data: save cost in cloud is super important when designing or operating pipelines. Good insights by AirBnB 🌶️🌶️🌶️
Automating Merchant Live Monitoring with Real-Time Analytics: Charon: Pinot & Kafka, a powerful combination 🌶️🌶️

Apache Kafka

Event-Driven Architecture with Apache Kafka for .NET Developers Part 2 - Event Consumer
The Best Books to Learn Apache Kafka: 100% agree in the list and comments. I would add Designing Event-Driven Systems. It’s a good complement to understand some of the business trends around Kafka 🌶️🌶️
Kafka Migration and Lessons Learned: very good and honest article by Honeycomb. If you managing kafka clusters, don’t miss this one 🌶️🌶️🌶️
How We Process One Billion Events Per Day With Kafka
Apache Kafka™ and Kafka Streams Workshop: I’m big fan of Jacek 🌶️🌶️🌶️
Debuting a Modern C++ API for Apache Kafka: not sure if this is something good or not.
Spring for Apache Kafka 2.7.0 Available
Getting started with Kafka and Rust: Part 1 and Part 2: I would love to learn Rust.
What’s New in Apache Kafka 2.8.0: Apache blog. It’s great to see the preview of Kafka without Zookeeper 🌶️🌶️🌶️
What’s New in Apache Kafka 2.8 (Confluent)
Release Notes - Kafka - Version 2.6.2
Leader election and Sharding Practices at Wix microservices: this is a great example of different Kafka patterns 🌶️🌶️🌶️
Introducing Red Hat OpenShift Streams for Apache Kafka: Red Hat launch their own Kafka managed service. A lot of competition!
How to Survive an Apache Kafka Outage: I always enjoy Jakub’s articles and this isn’t different 🌶️🌶️🌶️
Kafka weather station: cool idea by Igor!
Monitoring Your Event Streams: Tutorial for Observability Into Apache Kafka Clients: I always miss more content about observability and streaming and this article by Allison is a hidden gem 🌶️🌶️🌶️
Kafka Monthly Digest – April 2021 🌶️🌶️🌶️
Three simple ideas to make your life easier with Kafka: self-hype 😇. I really appreciate to the feedback I had in Twitter. Thanks!!

Kafka on Kubernetes

Beyond the Quickstart: Running Apache Kafka as a Service on Kubernetes: this is an unusual article given the sponsors of the article and the service but I liked it 🌶️
Connect AMQ Streams to your Red Hat OpenShift 4 monitoring stack.
Path to CRD v1: some big changes coming to Strimzi.

Apache Pulsar

Indestructible Storage in the Cloud with Apache Bookkeeper: this is a superb article. It isn’t exactly about Pulsar but Bookkeeper, one of the main pieces of Pulsar. I never though of using Bookeeper in an independent way and it opens an endless world of possibilities 🌶️🌶️🌶️
Announcing AMQP 1.0 Connector for Apache Pulsar
Updates on Apache Pulsar April Video.
Flink SQL on StreamNative Cloud

Kafka Connect / KSQL / Kafka Streams

Apache Flink

Running Apache Flink on Kubernetes: good and concise guide 🌶️🌶️
Stateful Functions 3.0.0: Remote Functions Front and Center
Apache Flink 1.12.3 Released
Custom Traits in Apache Calcite: I’m not as familiar with Calcite as I would like but it’s a good deep-dive 🌶️🌶️🌶️

Apache Spark

What's new in Apache Spark 3.1 - nodes decommissioning 🌶️
What's new in Apache Spark 3.1 - Kubernetes Generally Available! 🌶️
What’s New in Apache Spark™ 3.1 Release for Structured Streaming 🌶️🌶️🌶️

AsyncAPI and Schema Management

Change Data Capture

Google Cloud

Cloud SQL for SQL Server enables you to perform change data capture (CDC)
Real-time Crypto Price Anomaly Detection with Deep Learning and Band Protocol: I’m not a fan of the crytpo thing but the article is good example of how to integrate streaming and machine learning 🌶️
GCP Dataflow by an Apache Spark guy: Bartosz is always contributing great content to the newsletter 🌶️🌶️🌶️
Using TFX inference with Dataflow for large scale ML inference patterns: ML and streaming is cool but hard, Reza did a great job explaining it in this article 🌶️🌶️
Apache Beam 2.29.0
An Offline to Online Data Pipeline at WePay 🌶️
Track changes in SQL Server on Google Cloud using Change Data Capture

Azure

Event-Driven Architecture with Apache Kafka for .NET Developers Part 3 - Azure Event Hubs
Setting Up Secure Networking in Confluent with Azure Private Link: Private Links are difficult in the beginning, this is a good way to start with them 🌶️
Easily build real-time apps with WebSockets and Azure Web PubSub—now in preview: this is a great idea, or at least, it seems it is. Typical use case solved in a cloud way.

Amazon AWS

Tools

kafka-s3-backed-serde: this is pretty cool to work big files in Kafka.
kc-etcd: an example Kafka Connect source connector, ingesting changes from etcd
Schema Registry Maven Plugin: register your Avro Schemas in Production.
SoftwareMill Kafka Visualization: really cool way to see how Kafka works! Highly recommended.
DashBuilder: an Apache Licensed business reporting and monitoring tool.
Kafka Streams dashboards: very good work by Neil
Learn Kafka and Event Streams with fun: use a led strip to learn stream processing. Cool idea!

That’s all! Comments?

Drop me a message in Twitter
Join the Streaming Annotated Telegram group
Join the Streaming Annotated Goodreads group

Streaming Annotated Newsletter

Discussion about this post