Pipeline YAML descriptor (quix.yaml)
The quix.yaml
file serves as the Infrastructure as Code (IaC) descriptor for a Quix project. It defines the data pipeline, including the applications (deployments) and topics that compose the pipeline. This file allows you to configure and manage your data pipeline in a declarative manner, ensuring consistency across environments.
Structure of quix.yaml
1. Metadata
The metadata
section contains basic information about the file itself, such as the version of the schema being used. This versioning ensures compatibility and helps manage changes to the structure of the quix.yaml
file over time.
2. Deployments
The deployments
section is where you define each component of your data pipeline. Each deployment represents an application or service that performs a specific role in the pipeline, such as a source, transformation, or sink.
Example Deployment
deployments:
- name: CSV data source
application: csv data source
deploymentType: Service
version: latest
resources:
cpu: 200
memory: 500
replicas: 1
desiredStatus: Running
variables:
- name: output
inputType: OutputTopic
description: Name of the output topic to write into
required: true
value: csv-data
- name: data-visualizer
application: data-visualizer
version: latest
deploymentType: Service
resources:
cpu: 500
memory: 1000
replicas: 1
publicAccess:
enabled: true
urlPrefix: visualization
network:
ports:
- port: 80
targetPort: 8050
variables:
- name: input
inputType: InputTopic
description: This is the input topic to read data from
required: true
value: csv-data
Explanation:
-
name: The identifier for the deployment within your pipeline. It's typically a descriptive name of what the deployment does.
-
application: Specifies the name of the application or microservice to deploy. This should match the application folder where the code resides.
-
deploymentType: Defines the type of deployment. Common types include
Service
for long-running services,Job
for tasks that run to completion, and others as required by your architecture. -
version: Indicates the version of the application to deploy. Using
latest
ensures that the most recent version is deployed, but this can be set to a specific version to ensure consistency. You can also use a commit reference in order to have a specific version. -
resources: Defines the compute resources allocated to the deployment. This includes:
- cpu: The amount of CPU resources allocated (in millicores).
- memory: The amount of memory allocated (in megabytes).
- replicas: The number of instances of the application to run for scalability and fault tolerance.
-
desiredStatus: The state in which you want the application to be. Common statuses include
Running
andStopped
. -
publicAccess: Configures the public accessibility of the deployment. If enabled, the service will be accessible via a public URL.
- enabled: A boolean value that determines if public access is enabled (
true
orfalse
). - urlPrefix: The URL prefix under which the service will be accessible.
- enabled: A boolean value that determines if public access is enabled (
-
network: Defines the network settings for the deployment, including the exposed ports.
- ports: A list of ports to expose.
- port: The external port number that will be exposed.
- targetPort: The internal port number the traffic will be directed to.
- ports: A list of ports to expose.
-
variables: Defines any configurable parameters for the deployment. These are typically inputs or outputs such as:
- name: The name of the variable.
- inputType: The type of input, usually
InputTopic
,OutputTopic
,Secret
, orFreeText
. - description: A brief explanation of what this variable does.
- required: Indicates whether this variable must be provided (
true
orfalse
). - value: The default or specified value for this variable.
3. Topics
The topics
section describes the data streams that your applications will produce or consume. Topics are the backbone of data communication in a Quix pipeline, where each topic corresponds to a data stream that can be read by one or more applications.
Example Topic
topics:
- name: csv-data
persisted: false
configuration:
partitions: 1
retentionInMinutes: 6540
retentionInBytes: 182452224
cleanupPolicy: Delete
Explanation:
-
name: The unique identifier for the topic within your pipeline.
-
persisted: A boolean value that indicates whether the data in this topic should be stored persistently. Setting this to
false
means data will not be stored permanently and might be deleted after processing. -
configuration: (Optional) This section allows you to fine-tune how the topic behaves. Common configurations include:
- partitions: The number of partitions within the topic. More partitions allow for greater parallelism but may complicate data ordering.
- retentionInMinutes: How long the data should be retained in the topic before being deleted or compacted.
- retentionInBytes: The total size of data that the topic can retain. When this limit is reached, older data is deleted.
- cleanupPolicy: Defines how the topic handles old data.
Delete
means old data is removed, whereas other policies likeCompact
could be used for different use cases.
How to Use the quix.yaml
File
This file is central to managing your Quix project. When you deploy your pipeline, Quix uses this configuration to orchestrate the setup of your data pipeline components. It ensures that all services are deployed with the correct resources and are connected through the appropriate topics.
Modifying and Updating
Whenever changes are needed, such as scaling an application or adjusting the topic configuration, you can simply modify the quix.yaml
file and redeploy your pipeline. This approach allows for a repeatable and version-controlled method of managing your pipeline infrastructure.
Best Practices
- Version Control: Always keep your
quix.yaml
file under version control (e.g., in Git) to track changes and collaborate with your team. - Resource Management: Regularly review the resources allocated to each deployment to ensure optimal use of infrastructure.
- Testing: Test changes in a development environment before deploying to production to avoid disruptions in data processing.
By understanding and effectively managing your quix.yaml
file, you can maintain a robust and scalable data pipeline that meets the needs of your organization.