Apache Flink is one of the most popular frameworks for data stream processing. As a stateful processing engine, Flink is able to handle processing logic with aggregations, joins, and windowing. To ensure that Flink jobs are recoverable with exactly-once semantics, Flink has a state-of-the-art state snapshotting mechanism, so in the event of a failure, the job can be resumed from the latest snapshot.

In some advanced use cases, such as job migrations or job auditing, users may be required to inspect or modify their Flink job’s state snapshots (called Savepoints and Checkpoints in Flink). For this purpose, Flink provides the State Processor API. However, this API is not always straightforward to use and requires deep understanding of Flink operator states.

In this post, we’ll cover an example of using the State Processor API, broken up into 3 parts:

  1. Introduce our Flink job which reads data from an Apache Kafka topic
  2. Deep dive into how Flink’s KafkaSource maintains its state
  3. Use the State Processor API to extract the Kafka partition-offset state from the Flink job’s savepoint/checkpoint

If you want to see an example of the State Processor API in use, feel free to skip ahead to the last section.

Note that this post is a technical tutorial for those who want to get started with the State Processor API, and is intended for readers who already have some familiarity with Apache Flink and stream processing concepts.

Creating a Flink Job

Below is the Java code for our Flink job. This job simply reads from the “source” topic in Kafka, deserializes the records as simple Strings, then writes the results to the “sink” topic.

  1. public class FlinkTest {
  3. public static void main(String[] args) throws Exception {
  4. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  6. KafkaSource<String> source = KafkaSource.<String>builder()
  7. .setBootstrapServers("localhost:9092")
  8. .setTopics("source")
  9. .setGroupId("my-group")
  10. .setStartingOffsets(OffsetsInitializer.latest())
  11. .setValueOnlyDeserializer(new SimpleStringSchema())
  12. .build();
  14. DataStream<String> sourceStream = env.fromSource(
  15. source, WatermarkStrategy.forMonotonousTimestamps(), "Kafka Source")
  16. .uid("kafkasourceuid");
  18. KafkaRecordSerializationSchema<String> serializer = KafkaRecordSerializationSchema.builder()
  19. .setValueSerializationSchema(new SimpleStringSchema())
  20. .setTopic("sink")
  21. .build();
  22. Properties kprops = new Properties();
  23. kprops.setProperty("transaction.timeout.ms", "300000"); // e.g., 5 mins
  24. KafkaSink<String> sink = KafkaSink.<String>builder()
  25. .setBootstrapServers("localhost:9092")
  26. .setRecordSerializer(serializer)
  27. .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)
  28. .setKafkaProducerConfig(kprops)
  29. .setTransactionalIdPrefix("txn-prefix")
  30. .build();
  32. sourceStream.sinkTo(sink);
  33. env.enableCheckpointing(10000L);
  34. env.getCheckpointConfig().setCheckpointTimeout(60000);
  35. env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
  36. env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500L);
  37. env.getCheckpointConfig().setTolerableCheckpointFailureNumber(1);
  38. env.getCheckpointConfig().setCheckpointStorage("file:///tmp/checkpoints");
  39. env.execute("tester");
  40. }
  41. }

There are a few important things to note from this Flink job:

  1. We are creating a KafkaSource object. On line 6, the KafkaSource is then given to the StreamExecutionEnvironment’s fromSource method which returns a DataStreamSource object, which represents the actual Flink source operator.
  2. We set the operator ID for our KafkaSource operator using the uid method. This is set on line 16. It’s best practice to set all the IDs for all Flink operators when possible, but we’re emphasizing this here because we’ll actually need to refer to this ID when we use the State Processor API to inspect the state snapshots.
  3. Flink checkpointing is turned on. On lines 33-38, we are configuring our Flink environment’s checkpointing configurations to ensure that the Flink job will take incremental checkpoints. We’ll be analyzing these checkpoints later on.

Understanding the KafkaSource State

Before we inspect the checkpoints generated from our test Flink job, we first need to understand how the KafkaSource Flink operator saves its state.

As we’ve already mentioned, we’re using Flink’s KafkaSource to connect to our source Kafka data. Flink sources have 3 main components – Split, SourceReader, SplitEnumerator (Flink docs). A Split represents a portion of data that a source consumes and is the granularity that the source can parallelize reading data. For the KafkaSource, each Kafka partition corresponds to a separate Split, represented by the KafkaPartitionSplit class. The KafkaPartitionSplit is serialized by the KafkaPartitionSplitSerializer class. The logic for this serializer is pretty simple, it writes out a byte array of the Split’s topic, partition, and offset.

KafkaPartitionSplitSerializer’s serialize method:

  1. @Override
  2. public byte[] serialize(KafkaPartitionSplit split) throws IOException {
  3. try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
  4. DataOutputStream out = new DataOutputStream(baos)) {
  5. out.writeUTF(split.getTopic());
  6. out.writeInt(split.getPartition());
  7. out.writeLong(split.getStartingOffset());
  8. out.writeLong(split.getStoppingOffset().orElse(KafkaPartitionSplit.NO_STOPPING_OFFSET));
  9. out.flush();
  10. return baos.toByteArray();
  11. }
  12. }

At runtime, Flink will instantiate all of the operators, including the SourceOperator objects. For each stateful Flink operator, there is a name associated with each stateful object. In the case of a source operator, the name associated with the split states are defined by SPLIT_STATE_DESC.

  1. static final ListStateDescriptor<byte[]> SPLITS_STATE_DESC =
  2. new ListStateDescriptor<>("SourceReaderState", BytePrimitiveArraySerializer.INSTANCE);

We can inspect the SourceOperator class further to see where these split states are initialized, in the initializeState method.

SourceOperator’s initializeState method:

  1. @Override
  2. public void initializeState(StateInitializationContext context) throws Exception {
  3. super.initializeState(context);
  4. final ListState<byte[]> rawState =
  5. context.getOperatorStateStore().getListState(SPLITS_STATE_DESC);
  6. readerState = new SimpleVersionedListState<>(rawState, splitSerializer);
  7. }

The Flink state that source operators use is the SimpleVersionedListState, which uses the SimpleVersionedSerialization class. In the SimpleVersionedListState class, the serialize method calls the writeVersionAndSerialize method to ultimately serialize the state.

Finally, if we inspect the writeVersionAndSerialize method in the SimpleVersionedSerialization, we can see that before writing the actual data associated with our source operator, we first write out a few bytes for the serializer version and the data’s length.

SimpleVersionedSerialization’s writeVersionAndSerialize method:

  1. public static <T> void writeVersionAndSerialize(
  2. SimpleVersionedSerializer<T> serializer, T datum, DataOutputView out)
  3. throws IOException {
  4. checkNotNull(serializer, "serializer");
  5. checkNotNull(datum, "datum");
  6. checkNotNull(out, "out");
  8. final byte[] data = serializer.serialize(datum);
  10. out.writeInt(serializer.getVersion());
  11. out.writeInt(data.length);
  12. out.write(data);
  13. }

Let’s quickly recap the important parts from above:

  1. The KafkaSource operator stores its state in KafkaPartitionSplit objects.
  2. The KafkaPartitionSplit keeps track of the current topic, partition, and offset that the KafkaSource has last processed.
  3. When Flink savepointing/checkpointing occurs, a byte array representing the KafkaSource state gets written to the state snapshot. The byte array has a header which includes the serializer version and the length of data. Then the actual state data, which is a serialized version of the KafkaPartitionSplit, makes up the rest of the state byte array.

Now that we have some idea of how data is being serialized into Flink savepoints and checkpoints, let’s see how we can use the State Processor API to extract the Kafka source operator information from these state snapshots.

State Processor API to Inspect Kafka Source State

For maven projects, you can add the following dependency to your pom.xml file to start using the Flink State Processor API.

  1. <dependency>
  2. <groupId>org.apache.flink</groupId>
  3. <artifactId>flink-state-processor-api</artifactId>
  4. <version>1.18.0</version>
  5. </dependency>

The following class showcases the full example of how we can use the State Processor API to read KafkaSource offsets from a Flink savepoint or checkpoint.

  1. public class StateProcessorTest {
  3. public static void main(String[] args) throws Exception {
  4. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  6. String savepointPath = Path.of("/tmp/checkpoints/609bc335486ca6cfcc8692e4c1ff8782/chk-8").toString();
  7. SavepointReader savepoint = SavepointReader.read(env, savepointPath, new HashMapStateBackend());
  8. DataStream<byte[]> listState = savepoint.readListState(
  9. OperatorIdentifier.forUid("kafkasourceuid"),
  10. "SourceReaderState",
  11. PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO);
  12. CloseableIterator<byte[]> states = listState.executeAndCollect();
  13. while (states.hasNext()) {
  14. byte[] s = states.next();
  15. KafkaPartitionSplitSerializer serializer = new KafkaPartitionSplitSerializer();
  16. KafkaPartitionSplit split = serializer.deserialize(serializer.getVersion(), Arrays.copyOfRange(s, 8, s.length));
  17. System.out.println(
  18. String.format("topic=%s, partition=%s, startingOffset=%s, stoppingOffset=%s, topicPartition=%s",
  19. split.getTopic(), split.getPartition(),
  20. split.getStartingOffset(), split.getStoppingOffset(), split.getTopicPartition()));
  21. }
  23. System.out.println("DONE");
  24. }
  25. }

First, we’ll load the savepoint. The SavepointReader class from the State Processor API allows us to load a full savepoint or checkpoint. On line 7, we are loading a checkpoint that was created in “/tmp/checkpoints” as a result of running the test Flink job. As we mentioned in the previous section, the source operators use a SimpleVersionedListState, which the SavepointReader can read using the readListState method. When reading the list states, we need to know 3 things:

  1. Operator ID: “kafkasourceuid” set in our test Flink job
  2. State Name: “SourceReaderState” set in Flink’s SourceOperator class
  3. State TypeInformation: PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO set in Flink’s SourceOperator class

After we get our list states, we can simply iterate through each of the states, which are given as byte arrays. Since the SimpleVersionedSerialization serializer first writes the version and data length, which we don’t care about, we need to skip those headers. You’ll see on line 16 that we deserialize the byte array as a KafkaPartitionSplit after skipping the first 8 bytes of the state byte array.

Running the above code example gives the following result:

  1. topic=source, partition=0, startingOffset=3, stoppingOffset=Optional.empty, topicPartition=source-0
  2. DONE


In this post, we explained how Flink’s KafkaSource state is serialized into savepoints and covered an example of reading this state with the State Processor API. Flink’s State Processor API can be a powerful tool to analyze and modify Flink savepoints and checkpoints. However, it can be confusing for beginners to use and requires some in-depth knowledge about how the Flink operators manage their individual states. Hopefully this guide will help you understand the KafkaSource and serve as a good tutorial for getting started with the State Processor API.

For more content about Flink and stream processing, check out more content from DeltaStream’s blog. DeltaStream is a platform that simplifies the unification, processing, and governance of streaming data.