Kafka

Structure

Steps:

Configuration:

buffer.memory - the max size of RecordAccumulator
batch.size - the composite message batch size
linger.ms - how long will the messages be pending while the batch is not full-filled
max.block.ms - how long will block producer from producing more data, while the buffer.memory is full
max.in.flight.requests.per.connection
max.request.size

SST, Sparse Index (timestamp), SendFile

high throughput and low latency
- kafka can handle large volumes of messages quickly, making suitable for real-time data processing
scalability
- kafka’s distributed architecture allows it to scale horizontally, accomodating growing data needs by adding more nodes
fault tolerance and reliability
- kafka ensures data durability and reliability through replication and partitioning, maintaining operations even if some nodes fail
stream processing
- kafka streams enables building complex real-time data processing applications
data integration
- kafka acts as a central hub, integrating data from various sources and distributing it to multiple systems
open-suorce and community support
- kafka benefits from a large, active comminuty that continously improves and expands its capabilities
versatile use cases
- kafka is used for log aggregation, real-time analytics, event sourcing, messaging and metrics collection
compatibility with big data ecosystems
- kafka integrates well with technologies like Hadoop, Spark, es, facilitating comprehensive data pipelines
streamlined data processing
- kafak enables asynchonous processing, enhancing system performance and reliability