Streaming Annotated Monthly – January 2021
Receive the last news and interesting articles about streaming platforms and processing frameworks in your mailbox
I spent too much time reading news and articles on the Internet, specifically in my areas of interest: Streaming events and processing frameworks. As a goal for 2021, I started to think about how to use that time in a more productive way. It’s great to know what’s happening in the always vibrant streaming arena but it’s also a huge investment of time. This is how this newsletter has born.
I tend to focus on technical articles and deep dives so don’t expect commercial posts here. From time to time there is some interesting article about how a specific technology from a vendor has been used. I like those ones too.
I would love if you share with me the articles I missed or you add your comments if this newsletter is useful for you. It will help me to find the motivation to write it each month!
Governance and architecture
No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale 🌶️🌶️🌶️
Data Mesh Applied: How to Move Beyond the Data Lake with lakeFS
Battle-tested event-driven patterns for your microservices architecture (video)
From Lambda to Lambda-less: Lessons learned (LinkedIn Engineering)
CNCF End User Technology Radar: Database Storage, November 2020
Evolution of the Real-time Data Warehouses of the Alibaba Search and Recommendation Data Platform 🌶️🌶️🌶️
8 Lessons Learned from using Kafka with 1000 microservices (video)
Evaluating persistent, replicated message queues (2020 edition) 🌶️🌶️🌶️
Apache Kafka
Polyglot, Fault Tolerant Event-Driven Programming with ApacheKafka, Kubernetes and gRPC (video)
Kafka is not a Database (interesting and polemic article) 🌶️🌶️🌶️
Intro to Apache Kafka: How Kafka Works (recommended if you are starting with Kafka)
Kafka on Kubernetes
Kafka Streams, KsqlDB, Kafka Connect
Kafka client
Kafka Frameworks
Apache Pulsar
Apache Flink
Improvements in task scheduling for batch workloads in Apache Flink 1.12
Flink + TiDB: A Scale-Out Real-Time Data Warehouse for Second-Level Analytics
Apache Spark
Data+AI Summit follow-up: aggregations and state management 🌶️🌶️🌶️
Handling Late Arriving Dimensions Using a Reconciliation Pattern 🌶️
Apache Beam
Google Cloud (PubSub, Dataflow, etc.)
Schema management
Ensure Data Quality and Data Evolvability with a Secured Schema Registry 🌶️
Getting started with Apache Kafka and Red Hat service registry
Gentle (and practical) introduction to Apache Avro - Part 1 (self-hype)
Change Data Capture
Debezium serialization with Apache Avro and Apicurio Registry
How To Keep Elasticsearch in sync with relational databases? (not exactly CDC but related) 🌶️
A Change-Data-Capture use-case: designing an evergreen cache (slides)
That’s all! I hope it’s enough to keep you busy until the next one!
If you find it useful, please, share it with your network.