Streaming Annotated Newsletter

Share this post

Streaming Annotated Monthly – January 2021

streamingannotated.substack.com

Streaming Annotated Monthly – January 2021

Receive the last news and interesting articles about streaming platforms and processing frameworks in your mailbox

Antón (antonmry)
Jan 18, 2021
4
Share
Share this post

Streaming Annotated Monthly – January 2021

streamingannotated.substack.com

I spent too much time reading news and articles on the Internet, specifically in my areas of interest: Streaming events and processing frameworks. As a goal for 2021, I started to think about how to use that time in a more productive way. It’s great to know what’s happening in the always vibrant streaming arena but it’s also a huge investment of time. This is how this newsletter has born.

I tend to focus on technical articles and deep dives so don’t expect commercial posts here. From time to time there is some interesting article about how a specific technology from a vendor has been used. I like those ones too.

I would love if you share with me the articles I missed or you add your comments if this newsletter is useful for you. It will help me to find the motivation to write it each month!


Governance and architecture

  • No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale 🌶️🌶️🌶️

  • Data Mesh Principles and Logical Architecture 🌶️🌶️🌶️

  • Data Mesh Applied: How to Move Beyond the Data Lake with lakeFS

  • Battle-tested event-driven patterns for your microservices architecture (video)

  • A Guide to Enterprise Event-Driven Architecture

  • From Lambda to Lambda-less: Lessons learned (LinkedIn Engineering)

  • CNCF End User Technology Radar: Database Storage, November 2020

  • Evolution of the Real-time Data Warehouses of the Alibaba Search and Recommendation Data Platform 🌶️🌶️🌶️

  • 8 Lessons Learned from using Kafka with 1000 microservices (video)

  • The Big Little Guide to Message Queues

  • Real-time Data Pipelines — Complexities & Considerations

  • Evaluating persistent, replicated message queues (2020 edition) 🌶️🌶️🌶️

Apache Kafka

  • What’s New in Apache Kafka 2.7.0 🌶️🌶️🌶️

  • Apache Kafka Lag Monitoring at AppsFlyer 🌶️🌶️

  • Polyglot, Fault Tolerant Event-Driven Programming with ApacheKafka, Kubernetes and gRPC (video)

  • Kafka is not a Database (interesting and polemic article) 🌶️🌶️🌶️

  • Kafka as a storage system

  • How to Run Apache Kafka on Windows

  • Disaster Recovery for Multi-Region Kafka at Uber 🌶️🌶️🌶️

  • Intro to Apache Kafka: How Kafka Works (recommended if you are starting with Kafka)

Kafka on Kubernetes

  • Make your Kafka cluster production-ready

  • Bootstrap Kafka on Kubernetes (Strimzi) with Just 5 Commands

Kafka Streams, KsqlDB, Kafka Connect

  • Building a garbage-free network stack for Kafka streams

  • Event Streaming with Kafka Streams and ksqlDB (book) 🌶️🌶️🌶️

  • Building and Deploying a Real-Time Stream Processing ETL Engine with Kafka and ksqlDB

  • Announcing ksqlDB 0.14.0

  • Twelve Days Of SMT

  • Kafka Connect: The Magic Behind Mux Data Realtime Exports

Kafka client

  • Introducing the Confluent Parallel Consumer 🌶️🌶️

  • New release librdkafka v1.5.3

  • Kafka consumer in Java

Kafka Frameworks

  • Kafka Cron using wix/greyhound

  • Event Streams in Action (book)

  • Announcing Spring Cloud Stream Applications 2020.0.0 GA Release

Apache Pulsar

  • How Apache Pulsar is Helping Iterable Scale its Customer Engagement Platform

  • What's New in Apache Pulsar 2.7.0 🌶️🌶️🌶️

Apache Flink

  • Apache Flink 1.12.0 Release Announcement

  • Ask-Me-Anything: Apache Flink 1.12 Release (video)

  • Improvements in task scheduling for batch workloads in Apache Flink 1.12

  • Streaming Systems and Global State

  • Flink + TiDB: A Scale-Out Real-Time Data Warehouse for Second-Level Analytics

  • Flink SQL Advent

Apache Spark

  • Data+AI Summit follow-up: aggregations and state management 🌶️🌶️🌶️

  • Amazon EMR on Amazon Elastic Kubernetes Service (EKS)

  • Watermark and window-based processing

  • High Throughput Ingestion with Iceberg

  • Handling Late Arriving Dimensions Using a Reconciliation Pattern 🌶️

Apache Beam

  • Simplify creating data pipelines for media with Spotify’s Klio

  • DataFrame API Preview now Available!

  • Apache Beam 2.26.0

Google Cloud (PubSub, Dataflow, etc.)

  • Kalman Filters, Pub/Sub, and Birdwatching 🌶️🌶️🌶️

  • Best practices to use Apache Ranger on Dataproc

  • Machine learning patterns with Apache Beam and the Dataflow Runner, part I

  • Dataflow workers now use the Java 11 runtime

Schema management

  • Ensure Data Quality and Data Evolvability with a Secured Schema Registry 🌶️

  • Using Avro in a native executable

  • Getting started with Apache Kafka and Red Hat service registry

  • Integrating Quarkus with Apicurio Service Registry

  • Gentle (and practical) introduction to Apache Avro - Part 1 (self-hype)

Change Data Capture

  • Debezium serialization with Apache Avro and Apicurio Registry

  • How To Keep Elasticsearch in sync with relational databases? (not exactly CDC but related) 🌶️

  • A Change-Data-Capture use-case: designing an evergreen cache (slides)

  • Debezium with Single Message Transformation (SMT)

  • Distributed Tracing with Debezium

  • Debezium 1.4.0.CR1 Released

  • Outbox, Inbox patterns and delivery guarantees explained

  • New release CloudEvents 1.0.1

That’s all! I hope it’s enough to keep you busy until the next one!

If you find it useful, please, share it with your network.

4
Share
Share this post

Streaming Annotated Monthly – January 2021

streamingannotated.substack.com
Comments
Top
New

No posts

Ready for more?

© 2023 Anton Rodriguez - antonmry
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing