Streaming Annotated Monthly – April 2021

Open community to discuss the last news and articles about streaming platforms and processing frameworks

Apr 14, 2021

This was a quite complex month for me but here we go with a new newsletter full of content: 76 articles and 5 new tools. Amazing! One of the interesting things I’ve discovered this month is another streaming data compilation by Guido Schmutz. If I only knew it before, I would never start this one!

I’m quite happy with the community we are building here and I plan to focus more on that. The Telegram group is working quite well and this is only the beginning. I have in mind to do some type of activity, maybe a Telegram Voice Chat about the articles of the newsletter would be of interest for you? Let’s chat about it in the group.

For this month, I’ve also created a Goodreads group that already have more than 40 members. If, like me, books are your thing, join and let’s speak about them on this great platform.

That’s all! Learn a lot and stay safe!

Apache Flink

Apache Flink 1.12.2 Released
Maintaining Materialized Views with Change Data Capture (CDC) and Debezium (cookbook)
Learn Flink SQL — The Easy Way
Apache Flink: Towards a 20x throughput improvement using in-memory buffers: this is a great deep dive, mandatory if you are into Flink optimization pipelines 🌶️🌶️🌶️
A Rundown of Batch Execution Mode in the DataStream API: use the same API for streaming and batch seems a great idea 🌶️
Building Riviera: A Declarative Real-Time Feature Engineering Framework: it’s amazing what some people is doing out there 🌶️🌶️
Knative Eventing with Kafka and Spring Cloud
Pinterest Flink Deployment Framework: It’s fine but I was expecting a bit more from Pinterest on this.
Window Of Vulnerability: this is a surprising deep-dive on Fault Tolerance in Flink.
Data Driven Development for Stream Processing: this is more about observability. I liked it because I learn a couple of things which aren’t usually explained about how to organize your dashboards 🌶️🌶️
Apache Flink Roadmap: the feature radar is superb! Great idea 🌶️🌶️🌶️
Apache Flink Continuous Deployment

Spark Structured Streaming

Apache Beam

Beam College: a free educational program to provide hands-on training. Highly recommended if you want to learn Beam 🌶️🌶️🌶️
Getting Started with Snowflake and Apache Beam: there are many conversations about streaming ingestion in Snowflake. A space which should evolve very soon.
4 Ways to Effectively Debug Data Pipelines in Apache Beam

Schema Management

A Kafka Developer's Guide to AsyncAPI (video)
Understanding JSON Schema compatibility: Robert’s blog is a hidden treasure 🌶️🌶️🌶️
AsyncAPI joins Linux Foundation

Change Data Capture

Debezium does not impact source database performance 🌶️
Capturing Every Change From Shopify’s Sharded Monolith: if you are going to read only one article of this newsletter, choose this one! Many of us are working with CDC and facing problems to scale it to an organization level. This is article is full of insights and experiences doing it 🌶️🌶️🌶️
Debezium 1.5.0.Beta2 Released
Debezium 1.5.0.CR1 Released
The Journey from Batch to Real-time with Change Data Capture: it includes a comparison of Debezium and Amazon (AWS) Data Migration Service (DMS).

Google Cloud

Creating and managing schemas: it’s in preview and with limited functionality, but it’s great to see cloud provider to support schema management 🌶️
Migrate your MySQL and PostgreSQL databases using Database Migration Service, now GA
Building real-time market data front-ends with websockets and Google Cloud 🌶️
Introducing Databricks on Google Cloud – Now in Public Preview
Introducing Apache Spark Structured Streaming connector for Pub/Sub Lite: Pub/Sub Lite may decrease your bill significantly and the ecosystem around it is becoming more stable.
Pub/Sub push subscriptions can now be created with Cloud Run service endpoints protected by VPC Service Controls (preview)
Pub/Sub is now available in the europe-central2 region (Warsaw).
Dataflow Execution details are now available in Preview.
Dataflow SQL now supports user-defined functions (UDFs) written using SQL. For more information, see Dataflow SQL user-defined functions (preview).
Dataflow is now able to use workers, Dataflow Shuffle, Streaming Engine, FlexRS, and regional endpoints in zones in europe-central2 (Warsaw)

Azure

Integrating Azure and Confluent: Real-Time Search Powered by Azure Cache for Redis and Spring Cloud: it’s amazing how well Microsoft integrate different open-source Cloud services in their own cloud 🌶️
Event Hubs on Azure Stack Hub for disconnected scenarios is now generally available
Azure Stream Analytics Dedicated now generally available
General availability: Stream Analytics runs on Azure Stack Hub
Public preview: Azure Event Grid now provides support for delivery headers and additional advanced filters among other updates

Amazon

Tools

Awesome Open-Source Contribs for Apache Kafka: a curated list of awesome open-source frameworks, libraries, tools and examples for the Apache Kafka project.
Quick profiling of data in Apache Kafka using kafkacat and visidata: very useful.
Kafcat: a Rust fully async rewrite of kafkacat.
A Great Day Out With... Apache Kafka: superb idea by @gunnarmorling and @hpgrahsl 🙌
ksqlDB GraphQL poc: setup to serve as proof of concept in using Kafka with ksqlDB in combination with the query language GraphQL.

That’s all! Comments? Drop me a message in Twitter or join the Streaming Annotated Telegram group!

Streaming Annotated Newsletter

Discussion about this post