Version: sdf-beta5

Types

The types define the schema of the object used in the dataflow. Once defined, the types can be used in the state, operators, source and sinks. Using built-in serializer, the dataflow can be serialize and deserialize from/to the topic.

Types are in defined types section of the dataflow. They can be defined in the package which can be shared across multiple dataflows.

Primitive Types

The primitive types represents basic primitive types. The following are the list of primitive types:

bool boolean value
u8,u16,u32,u64 unsigned integers of 8, 16, 32, 64 bits
i8,i16,i32,i64 signed integers of 8, 16, 32, 64 bits
f32,f64 floating point numbers of 32, 64 bits

Primitive types can be alias or used as part of the complex types. Following is type alias for u16 and f64. So instead of using u16 and f64, you can use range and latitude respectively.

types:
  range:
    type: u16
  latitude:
    type: f64
  weight:
    type: f8

String

String is a sequence of characters. It is defined as string type.

types:
  my-type:
    type: string

Bytes

For raw binary data, we can use the bytes type. It is represented as a sequence of u8.

Example:

types:
  raw-data:
    type: bytes

Object

Object type represents a complex type that has multiple properties. It is defined as object type. The properties are defined as key-value pairs. For example, the following is a simple object type representing a person.

types:
  person:
    type: object
    properties:
      name:
        type: string
      weight:
        type: u8

The property type can be any primitive type or complex type. So using alias defined above, you can define person as follows:

types:
  person:
    type: object
    properties:
      name:
        type: string
      weight:
        type: weight

Enum

Can represents different variant of the type. It can represent a simple enum or sum type. To define enum, use enum followed by oneOf properties.

For example, the following is a simple enum type representing fruits.

types:
  fruit:
    type: enum
    oneOf:
      apple:
        type: null
      banana:
        type: null
      grape:
        type: null

This can represent enum value such as apple, banana, grape depends on serialization scheme. By default, enum variant doesn't need have value type. Value type is useful if variant has associated value.

For example, the following is a enum type representing vehicle type with associated value.

types:
  vehicle:
    type: enum
    oneOf:
      car:
        type: car
      airplane:
        type: airplane
  car:
    type: object
    properties:
      model:
        type: string
      range:
        type: u16
  airplane:
    type: object
    properties:
      model:
        type: string
      engines:
        type: u8
      celing:
        type: u32

If this is serialized as JSON, it will look like this:

{
  "vehicle": {
    "car": {
      "model": "tesla",
      "range": 300
    }
  },
  "vehicle": {
    "airplane": {
      "model": "737",
      "engines": 2,
      "celing": 35000
    }
  }
}

List

List represents an ordered sequence of items. It is defined as list type. The item must be same type. For example, the following is a list type representing list of fruits.

types:
  fruits:
    type: list
    items:
      type: fruit

If this is serialized as JSON, it will look like this:

{
  "fruits": ["apple","banana","grape"]
}

Key-Value

Key-Value type is used by partitioned state. Key-Value can be defined in the type section or as part of the state definition. The following is a key-value type representing a word count in the state.

states:
  count-per-word:
    type: keyed-state
    properties:
      key:
        type: string
      value:
        type: u16

or it can be defined in the type section as follows:

types:
  word-count:
    type: keyed-state
    properties:
      key:
        type: string
      value:
        type: u16

Nested types

To enhance organization and clarity, we can define complex data structures within other types.

In particular, we can define nested types in object, enum and list types. In order to do that, we need to add the type-name configuration to the type that is being added.

We must ensure that type names are unique within the dataflow. If a dataflow has duplicated type names with different definition, it will fail to validate.

For example, the following is a valid syntax to define nested types:

types:
  # Nested object within an object
  person:
    type: object
    properties:
      name:
        type: string
      address:
        type: object
        type-name: address
        properties:
          street:
            type: string
          city:
            type: string
          zip_code:
            type: string

  # Nested list within an object
  product:
    type: object
    properties:
      name:
        type: string
      categories:
        type: list
        type-name: categories
        items:
          type: object
          type-name: category
          properties:
            name:
              type: string
            description:
              type: string

  # Nested object within a list
  brands:
    type: list
    items:
      type: object
      type-name: brand
      properties:
        name:
          type: string
        country:
          type: string
  
  # Nested object within an enum
  car-type:
    type: enum
    oneOf:
      sedan:
        type: null
      truck:
        type: object
        type-name: truck-details
        properties:
          num_wheels:
            type: u8

Types in the Operator

Once the types are defined, it can be used in the operator. For example, the following is a map operator that takes Car type and return CarLocation type.

transforms:
  - operator: map
    run: |
      fn get_car_location(car: Car) -> Result<CarLocation> {
        Ok(CarLocation {
          car: format!("{} {}", car.maker, car.model),
          color: car.color,
          location: car.location,
        })
      }

Note that when used in the operator, type name will be translated according to languages convention. For example, in Rust, the type name will be CarLocation because Rust uses CamelCase. In Python, it will be car_location because Python uses snake_case.

Types in the Source and Sink for Serialization and Deserialization

Types are used in the source and sink for serialization and deserialization. It is configured globally in the dataflow as default for all topics. It can be also configured per source or sink..

For example, following set JSON as default serialization for all topics.

config:
  converter: json
  consumer:
    default_starting_offset:
      value: 0
      position: End

Primitive Types​

String​

Bytes​

Object​

Enum​

List​

Key-Value​

Nested types​

Types in the Operator​

Types in the Source and Sink for Serialization and Deserialization​