Streaming Annotated Newsletter

Share this post
Streaming Annotated Monthly – May 2021
streamingannotated.substack.com

Streaming Annotated Monthly – May 2021

Open community to discuss the last news and articles about streaming platforms and processing frameworks

Antón (antonmry)
May 20, 2021
Comment
Share

Here we go with another newsletter! 70 new articles, 7 tools and a new surprise. From April to May we have a lot of conferences, starting with the Kafka Summit (don’t miss the Kafka Summit Europe 2021 Recap) in April, Data + AI Summit (Spark) in May and the Pulsar Summit in June. There are also a lot of conferences for cloud & software. Do we need another event?

Yes!

One of the comments in my last article was about getting together to share horror stories about Kafka deployments/outages so let’s do that! My idea is to use a Telegram Audio Chat (similar to Clubhouse) in the Telegram group but I’m open to other ideas.

If you have interest, sign up here!


Design and architecture

  • What is a streaming database?

  • Powering Messaging Enabledness with Yelp's Data Infrastructure

  • Confluent and Elastic Partner to Deliver Optimized Search and Real-Time Analytics: it’s interesting to see OSS companies working together, in this case, Elastic and Confluent.

  • Actor Model and Event Sourcing

  • Event Streams Are Nothing Without Action: I didn’t know Camunda.

  • Events should be as small as possible, right?. Right.

  • Internal consistency in streaming systems: this is a good deep dive. I would be cautious with the conclusions, the author isn’t an expert in each framework analysed but he made a great research 🌶️🌶️🌶️

  • Hybrid Multi-Cloud Event Mesh Architectural Design: we are going to see more and more articles about “Event Mesh” (including my talk in the Kafka Summit 😇).

  • How Airbnb Achieved Metric Consistency at Scale: I’m biased here (observability!) but this is a great article to understand the challenges of process metrics in big corps 🌶️🌶️🌶️

  • Adopting RocksDB within Manhattan: it’s always interesting to see RocksDB optimization, even if it isn’t related to streaming, and this article is a great source by Twitter 🌶️🌶️🌶️

  • 4 Business Benefits of an Event-Driven Architecture (EDA)

  • Data Platform as a Service: build platforms is something more than the technical challenge, Miguel provided a lot of insights based in this experience 🌶️🌶️

  • Achieving Insights and Savings with Cost Data: save cost in cloud is super important when designing or operating pipelines. Good insights by AirBnB 🌶️🌶️🌶️

  • Automating Merchant Live Monitoring with Real-Time Analytics: Charon: Pinot & Kafka, a powerful combination 🌶️🌶️

Apache Kafka

  • Event-Driven Architecture with Apache Kafka for .NET Developers Part 2 - Event Consumer

  • The Best Books to Learn Apache Kafka: 100% agree in the list and comments. I would add Designing Event-Driven Systems. It’s a good complement to understand some of the business trends around Kafka 🌶️🌶️

  • Kafka Migration and Lessons Learned: very good and honest article by Honeycomb. If you managing kafka clusters, don’t miss this one 🌶️🌶️🌶️

  • How We Process One Billion Events Per Day With Kafka

  • Apache Kafka™ and Kafka Streams Workshop: I’m big fan of Jacek 🌶️🌶️🌶️

  • Debuting a Modern C++ API for Apache Kafka: not sure if this is something good or not.

  • Spring for Apache Kafka 2.7.0 Available

  • Getting started with Kafka and Rust: Part 1 and Part 2: I would love to learn Rust.

  • What’s New in Apache Kafka 2.8.0: Apache blog. It’s great to see the preview of Kafka without Zookeeper 🌶️🌶️🌶️

  • What’s New in Apache Kafka 2.8 (Confluent)

  • Release Notes - Kafka - Version 2.6.2

  • Leader election and Sharding Practices at Wix microservices: this is a great example of different Kafka patterns 🌶️🌶️🌶️

  • Introducing Red Hat OpenShift Streams for Apache Kafka: Red Hat launch their own Kafka managed service. A lot of competition!

  • How to Survive an Apache Kafka Outage: I always enjoy Jakub’s articles and this isn’t different 🌶️🌶️🌶️

  • Kafka weather station: cool idea by Igor!

  • Monitoring Your Event Streams: Tutorial for Observability Into Apache Kafka Clients: I always miss more content about observability and streaming and this article by Allison is a hidden gem 🌶️🌶️🌶️

  • Kafka Monthly Digest – April 2021 🌶️🌶️🌶️

  • Three simple ideas to make your life easier with Kafka: self-hype 😇. I really appreciate to the feedback I had in Twitter. Thanks!!

Kafka on Kubernetes

  • Beyond the Quickstart: Running Apache Kafka as a Service on Kubernetes: this is an unusual article given the sponsors of the article and the service but I liked it 🌶️

  • Connect AMQ Streams to your Red Hat OpenShift 4 monitoring stack.

  • Path to CRD v1: some big changes coming to Strimzi.

Apache Pulsar

  • Indestructible Storage in the Cloud with Apache Bookkeeper: this is a superb article. It isn’t exactly about Pulsar but Bookkeeper, one of the main pieces of Pulsar. I never though of using Bookeeper in an independent way and it opens an endless world of possibilities 🌶️🌶️🌶️

  • Announcing AMQP 1.0 Connector for Apache Pulsar

  • Updates on Apache Pulsar April Video.

  • Flink SQL on StreamNative Cloud

Kafka Connect / KSQL / Kafka Streams

  • MongoDB Connector for Apache Kafka 1.5 Available Now

  • Troubleshooting The Performance of Streaming Data Pipelines

  • Announcing ksqlDB 0.17.0

Apache Flink

  • Running Apache Flink on Kubernetes: good and concise guide 🌶️🌶️

  • Stateful Functions 3.0.0: Remote Functions Front and Center

  • Apache Flink 1.12.3 Released

  • Custom Traits in Apache Calcite: I’m not as familiar with Calcite as I would like but it’s a good deep-dive 🌶️🌶️🌶️

Apache Spark

  • What's new in Apache Spark 3.1 - nodes decommissioning 🌶️

  • What's new in Apache Spark 3.1 - Kubernetes Generally Available! 🌶️

  • What’s New in Apache Spark™ 3.1 Release for Structured Streaming 🌶️🌶️🌶️

AsyncAPI and Schema Management

  • Simulating CloudEvents with AsyncAPI and Microcks

  • Event-driven APIs — Understanding the Principles

  • Release Announcement: Apicurio Registry 2.0.0.Final

  • AsyncAPI 2.0: Enabling the Event-Driven World

Change Data Capture

  • Debezium 1.5.0.Final Released

  • Capture Oracle database events in Apache Kafka with Debezium

  • Snowflake - Near Real-Time Ingestion from RDBMS using Debezium and Kafka 🌶️🌶️

  • Register your Avro Schemas in Production

Google Cloud

  • Cloud SQL for SQL Server enables you to perform change data capture (CDC)

  • Real-time Crypto Price Anomaly Detection with Deep Learning and Band Protocol: I’m not a fan of the crytpo thing but the article is good example of how to integrate streaming and machine learning 🌶️

  • GCP Dataflow by an Apache Spark guy: Bartosz is always contributing great content to the newsletter 🌶️🌶️🌶️

  • Using TFX inference with Dataflow for large scale ML inference patterns: ML and streaming is cool but hard, Reza did a great job explaining it in this article 🌶️🌶️

  • Apache Beam 2.29.0

  • An Offline to Online Data Pipeline at WePay 🌶️

  • Track changes in SQL Server on Google Cloud using Change Data Capture

Azure

  • Event-Driven Architecture with Apache Kafka for .NET Developers Part 3 - Azure Event Hubs

  • Setting Up Secure Networking in Confluent with Azure Private Link: Private Links are difficult in the beginning, this is a good way to start with them 🌶️

  • Easily build real-time apps with WebSockets and Azure Web PubSub—now in preview: this is a great idea, or at least, it seems it is. Typical use case solved in a cloud way.

Amazon AWS

  • Amazon Kinesis Data Analytics for Apache Flink introduces custom maintenance windows in preview

  • Amazon MSK now supports Apache Kafka version 2.8.0 and 2.6.2

  • Custom DNS With AWS Privatelink for Databricks Workspaces: useful 🌶️🌶️

  • Private Databricks Workspaces With AWS PrivateLink Is in Public Preview

Tools

  • kafka-s3-backed-serde: this is pretty cool to work big files in Kafka.

  • kc-etcd: an example Kafka Connect source connector, ingesting changes from etcd

  • Schema Registry Maven Plugin: register your Avro Schemas in Production.

  • SoftwareMill Kafka Visualization: really cool way to see how Kafka works! Highly recommended.

  • DashBuilder: an Apache Licensed business reporting and monitoring tool.

  • Kafka Streams dashboards: very good work by Neil

  • Learn Kafka and Event Streams with fun: use a led strip to learn stream processing. Cool idea!


That’s all! Comments?

  • Drop me a message in Twitter

  • Join the Streaming Annotated Telegram group

  • Join the Streaming Annotated Goodreads group

CommentComment
ShareShare

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

TopNew

No posts

Ready for more?

© 2022 Anton Rodriguez - antonmry
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing