Streaming Annotated Monthly – May 2021
Open community to discuss the last news and articles about streaming platforms and processing frameworks
Here we go with another newsletter! 70 new articles, 7 tools and a new surprise. From April to May we have a lot of conferences, starting with the Kafka Summit (don’t miss the Kafka Summit Europe 2021 Recap) in April, Data + AI Summit (Spark) in May and the Pulsar Summit in June. There are also a lot of conferences for cloud & software. Do we need another event?
Yes!
One of the comments in my last article was about getting together to share horror stories about Kafka deployments/outages so let’s do that! My idea is to use a Telegram Audio Chat (similar to Clubhouse) in the Telegram group but I’m open to other ideas.
If you have interest, sign up here!
Design and architecture
Powering Messaging Enabledness with Yelp's Data Infrastructure
Confluent and Elastic Partner to Deliver Optimized Search and Real-Time Analytics: it’s interesting to see OSS companies working together, in this case, Elastic and Confluent.
Event Streams Are Nothing Without Action: I didn’t know Camunda.
Internal consistency in streaming systems: this is a good deep dive. I would be cautious with the conclusions, the author isn’t an expert in each framework analysed but he made a great research 🌶️🌶️🌶️
Hybrid Multi-Cloud Event Mesh Architectural Design: we are going to see more and more articles about “Event Mesh” (including my talk in the Kafka Summit 😇).
How Airbnb Achieved Metric Consistency at Scale: I’m biased here (observability!) but this is a great article to understand the challenges of process metrics in big corps 🌶️🌶️🌶️
Adopting RocksDB within Manhattan: it’s always interesting to see RocksDB optimization, even if it isn’t related to streaming, and this article is a great source by Twitter 🌶️🌶️🌶️
Data Platform as a Service: build platforms is something more than the technical challenge, Miguel provided a lot of insights based in this experience 🌶️🌶️
Achieving Insights and Savings with Cost Data: save cost in cloud is super important when designing or operating pipelines. Good insights by AirBnB 🌶️🌶️🌶️
Automating Merchant Live Monitoring with Real-Time Analytics: Charon: Pinot & Kafka, a powerful combination 🌶️🌶️
Apache Kafka
Event-Driven Architecture with Apache Kafka for .NET Developers Part 2 - Event Consumer
The Best Books to Learn Apache Kafka: 100% agree in the list and comments. I would add Designing Event-Driven Systems. It’s a good complement to understand some of the business trends around Kafka 🌶️🌶️
Kafka Migration and Lessons Learned: very good and honest article by Honeycomb. If you managing kafka clusters, don’t miss this one 🌶️🌶️🌶️
Apache Kafka™ and Kafka Streams Workshop: I’m big fan of Jacek 🌶️🌶️🌶️
Debuting a Modern C++ API for Apache Kafka: not sure if this is something good or not.
Getting started with Kafka and Rust: Part 1 and Part 2: I would love to learn Rust.
What’s New in Apache Kafka 2.8.0: Apache blog. It’s great to see the preview of Kafka without Zookeeper 🌶️🌶️🌶️
What’s New in Apache Kafka 2.8 (Confluent)
Leader election and Sharding Practices at Wix microservices: this is a great example of different Kafka patterns 🌶️🌶️🌶️
Introducing Red Hat OpenShift Streams for Apache Kafka: Red Hat launch their own Kafka managed service. A lot of competition!
How to Survive an Apache Kafka Outage: I always enjoy Jakub’s articles and this isn’t different 🌶️🌶️🌶️
Kafka weather station: cool idea by Igor!
Monitoring Your Event Streams: Tutorial for Observability Into Apache Kafka Clients: I always miss more content about observability and streaming and this article by Allison is a hidden gem 🌶️🌶️🌶️
Three simple ideas to make your life easier with Kafka: self-hype 😇. I really appreciate to the feedback I had in Twitter. Thanks!!
Kafka on Kubernetes
Beyond the Quickstart: Running Apache Kafka as a Service on Kubernetes: this is an unusual article given the sponsors of the article and the service but I liked it 🌶️
Connect AMQ Streams to your Red Hat OpenShift 4 monitoring stack.
Path to CRD v1: some big changes coming to Strimzi.
Apache Pulsar
Indestructible Storage in the Cloud with Apache Bookkeeper: this is a superb article. It isn’t exactly about Pulsar but Bookkeeper, one of the main pieces of Pulsar. I never though of using Bookeeper in an independent way and it opens an endless world of possibilities 🌶️🌶️🌶️
Updates on Apache Pulsar April Video.
Kafka Connect / KSQL / Kafka Streams
Apache Flink
Running Apache Flink on Kubernetes: good and concise guide 🌶️🌶️
Custom Traits in Apache Calcite: I’m not as familiar with Calcite as I would like but it’s a good deep-dive 🌶️🌶️🌶️
Apache Spark
What's new in Apache Spark 3.1 - Kubernetes Generally Available! 🌶️
What’s New in Apache Spark™ 3.1 Release for Structured Streaming 🌶️🌶️🌶️
AsyncAPI and Schema Management
Change Data Capture
Capture Oracle database events in Apache Kafka with Debezium
Snowflake - Near Real-Time Ingestion from RDBMS using Debezium and Kafka 🌶️🌶️
Google Cloud
Cloud SQL for SQL Server enables you to perform change data capture (CDC)
Real-time Crypto Price Anomaly Detection with Deep Learning and Band Protocol: I’m not a fan of the crytpo thing but the article is good example of how to integrate streaming and machine learning 🌶️
GCP Dataflow by an Apache Spark guy: Bartosz is always contributing great content to the newsletter 🌶️🌶️🌶️
Using TFX inference with Dataflow for large scale ML inference patterns: ML and streaming is cool but hard, Reza did a great job explaining it in this article 🌶️🌶️
Track changes in SQL Server on Google Cloud using Change Data Capture
Azure
Event-Driven Architecture with Apache Kafka for .NET Developers Part 3 - Azure Event Hubs
Setting Up Secure Networking in Confluent with Azure Private Link: Private Links are difficult in the beginning, this is a good way to start with them 🌶️
Easily build real-time apps with WebSockets and Azure Web PubSub—now in preview: this is a great idea, or at least, it seems it is. Typical use case solved in a cloud way.
Amazon AWS
Amazon Kinesis Data Analytics for Apache Flink introduces custom maintenance windows in preview
Amazon MSK now supports Apache Kafka version 2.8.0 and 2.6.2
Custom DNS With AWS Privatelink for Databricks Workspaces: useful 🌶️🌶️
Private Databricks Workspaces With AWS PrivateLink Is in Public Preview
Tools
kafka-s3-backed-serde: this is pretty cool to work big files in Kafka.
kc-etcd: an example Kafka Connect source connector, ingesting changes from etcd
Schema Registry Maven Plugin: register your Avro Schemas in Production.
SoftwareMill Kafka Visualization: really cool way to see how Kafka works! Highly recommended.
DashBuilder: an Apache Licensed business reporting and monitoring tool.
Kafka Streams dashboards: very good work by Neil
Learn Kafka and Event Streams with fun: use a led strip to learn stream processing. Cool idea!
That’s all! Comments?
Drop me a message in Twitter
Join the Streaming Annotated Telegram group
Join the Streaming Annotated Goodreads group