Kafka Producers: Writing Messages to Kafka
There are many reasons an application might need to write messages to Kafka: recording user activities for auditing or analysis, recording metrics, storing log message send much more.
Kafka producer components
۱٫We start producing messages to Kafka by creating a Producer Record,which must include the topic we want to send the record to and a value. Optionally, we can also specify a key and/or a partition.
۲٫the first thing the producer will do is serialize the key and value objects to Byte Arrays so they can be sent over the network.
۳٫Next, the data is sent to a partitioner .If we specified a partition in the Producer Record, the partitioner doesn’t do anything and simply returns the partition we specified. If we didn’t, the partitioner will choose a partition for us, usually based on the Producer Record key. Once a partition is selected, the producer knows which topic and partition the record will go to.
۴٫It then adds the record to a batch of records that will also be sent to the same topic and partition.
۵٫When the broker receives the messages, it sends back a response.
۶٫If the messages were successfully written to Kafka, it will return a Record Metadata object with the topic, partition, and the offset of the record within the partition.
۷٫If the broker failed to write the messages, it will return an error. When the producer receives an error, it may retry sending the message a few more times before giving up and returning an error.
provides the initial hosts that act as the starting point for a Kafka client to discover the full set of alive servers in the cluster. pairs of brokers that the producer will use to establish initial port connection to the Kafka cluster. This list doesn’t need to include all brokers, since the producer will get more information after the initial connection. syntax:
acks=0 : the producer will not wait for a reply from the broker before assuming the message was sent successfully.the producer is not waiting for any response from the server, it can send messages as fast as the network will support
acks=1 : the producer will receive a success response from the broker the moment the leader replica received the message. If the message can’t be written to the leader , the producer will receive an error response and can retry sending the message, avoiding potential loss of data. The message can still get lost if the leader crashes.
acks=all : the producer will receive a success response from the broker once all in-sync replicas received the message. This is the safest mode since you can make sure more than one broker has the message and that the message will survive even in the case of crash.
This sets the amount of memory the producer will use to buffer messages waiting to be sent to brokers.
By default, messages are sent uncompressed. This parameter can be set to snappy,
gzip, or lz4, in which case the corresponding compression algorithms will be used to
compress the data before sending it to the brokers.
When multiple records are sent to the same partition, the producer will batch them together. This parameter controls the amount of memory in bytes (not messages!) that will be used for each batch. When the batch is full, all the messages in the batch will be sent.
linger.ms controls the amount of time to wait for additional messages before sending the current batch.
This can be any string, and will be used by the brokers to identify messages sent from the client.
This controls how many messages the producer will send to the server without receiving responses.Setting this high can increase memory usage while improving throughput, but setting it too high can reduce throughput as batching becomes less efficient.
This setting controls the size of a produce request sent by the producer.
Kafka Consumers: Reading Data from Kafka
Consumers and Consumer Groups
Suppose you have an application that needs to read messages from a Kafka topic, run some validations against them, and write the results to another data store.
In this case application will create a consumer object, subscribe to the appropriate topic and start receiving messages, validating them and writing the results.
This may work well for a while, but what if the rate at which producers write messages to the topic exceeds the rate at which your application can validate them? If you are limited to asingle consumer reading and processing the data, your application may fall farther and farther behind, unable to keep up with the rate of incoming messages. Obviously there is a need to scale consumption from topics. Just like multiple producers can write to the same topic, we need to allow multiple consumers to read from the same topic, splitting the data between them.
Kafka consumers are typically part of a consumer group.When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.
Let’s take topic T1 with four partitions. Now suppose we created a new consumer, C1, which is the only consumer in group G1, and use it to subscribe to topic T1. Consumer C1 will get all messages from all four t1 partitions.
If we add another consumer, C2, to group G1, each consumer will only get messages from two partitions. Perhaps messages from partition 0 and 2 go to C1 and messages from partitions 1 and 3 go to consumer C2.
If G1 has four consumers, then each will read messages from a single partition.
If we add more consumers to a single group with a single topic than we have partitions, some of the consumers will be idle and get no messages at all.
create topics with a large number of partitions—it allows adding more consumers when the load increases. Keep in mind that there is no point in adding more consumers than you have partitions in a topic—some of the consumers will just be idle. Adding a new consumer group ensures no messages are missed
Consumer Groups and Partition Rebalance
when a consumer shuts down or crashes; it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers.
Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified (e.g., if an administrator adds new partitions).
Moving partition ownership from one consumer to another is called a rebalance. During a rebalance, consumers can’t consume messages.when partitions are moved from one consumer to another, the consumer loses its current state; if it was caching any data, it will need to refresh its caches—slowing down the application until the consumer sets up its state again.
Commits and Offsets
we have a way of tracking which records were read by a consumer of the group. it allows consumers to use Kafka to track their position (offset) in each partition. We call the action of updating the current position in the partition a commit.