SDF Quickstart
Provisioning and operating a Stateful Dataflow is simple and only requires two prerequisites:
- A Fluvio Cluster to enable dataflows to consume and produce streaming data.
- A Dataflow File to define how a dataflow sources, transforms, and emits data.
You can build, test, and run Stateful Dataflows locally, then deploy them to your InfinyOn Cloud cluster.
In-line Dataflows
Dataflows can be defined in dataflow.yaml files. When prototyping, a dataflow.yaml can be composed of in-line code, making it the single asset required to define a dataflow.
Deploying an in-line dataflow is simple:
While in-line dataflows are a breeze to get started with, maintaining code in YAML is not always ideal. For complex projects, we recommend using Composable Dataflows.
Create and Run a Dataflow
Let's create a simple in-line dataflow which receives a sentence, splits it into words, and outputs the length of each word.
1. Installing the SDF CLI
Dataflows are managed via the SDF CLI that we install using fvm.
fvm install sdf-beta7
2. Create the Dataflow file
Create the dataflow file in a directory named word-length
:
$ mkdir -p word-length
$ cd word-length
$ touch dataflow.yaml
Add the following content to the dataflow.yaml
:
apiVersion: 0.5.0
meta:
name: word-length
version: 0.1.0
namespace: example
config:
converter: raw
topics:
sentences:
schema:
value:
type: string
converter: raw
words:
schema:
value:
type: string
converter: raw
services:
calc-word-length:
sources:
- type: topic
id: sentences
transforms:
- operator: flat-map
run: |
fn sentence_to_words(sentence: String) -> Result<Vec<String>> {
Ok(sentence.split_whitespace().map(String::from).collect())
}
- operator: map
run: |
pub fn word_length(word: String) -> Result<String> {
Ok(format!("{}({})", word, word.chars().count()))
}
sinks:
- type: topic
id: words
This dataflow.yaml
first declares a version for the dataflow configuration structure. It then defines a default record converter, "raw" instead of "json" in this case. Next, it lists two Fluvio Topics which the dataflow expects to be present, with an expected record schema. SDF will create these Topics if they do not already exist. Finally, the config defines a Service which will read from a source topic and write to a sink topic. The service uses two Operators, in this case defined in-line in Rust, to perform transformations on the data.
3. Run the Dataflow
Use the sdf
CLI to run the dataflow. This will start a REPL which we can use to communicate with the dataflow.
$ sdf run --ui
Note: When passed to sdf run
, the --ui
flag will start a local webserver allowing you to view the graphical representation of the dataflow on SDF Studio.
4. Test the Dataflow
First, let's use Fluvio to consume from the words
topic so we can see the output of the dataflow in real time:
$ fluvio consume words
Then use Fluvio to produce sentences to the sentences
topic:
$ fluvio produce sentences
Enter the following strings into the producer REPL:
Hello world
Hi there
You should see the following output in the consumer stream:
Hello(5)
world(5)
Hi(2)
there(5)
5. Show State
Stateful Dataflows are capable of maintaining states
(data values) in durable storage, which like a database, will persist when an SDF session ends. You can define arbitrary state values which can be accessed and updated in your dataflow. SDF also maintains some built-in state values which keep track of the dataflow's status.
To view the default metrics for the sentence-to-words
operator, use the show state
command:
>> show state calc-word-length/sentence-to-words/metrics
Key Window succeeded failed
stats * 2 0
View the metrics for the word-length
operator:
>> show state calc-word-length/word-length/metrics
Key Window succeeded failed
stats * 4 0
Congratulations! You've successfully built and run a dataflow! Many more examples are available on Github.
6. Clean-up
Exit the sdf
terminal and then remove the topics we created:
sdf clean --force
Note: The --force
option should only be used if you want to remove everything, including the topics created by this dataflow.