Advanced Topics
Object instantiation
This section describes the object that are to be instantiated from the YAML definition.
If the component is a literal, then it is returned as is:
3
will result in
3
If the component is a mapping with a "class_name" field, an object of type "class_name" will be instantiated by passing the mapping's other fields to the constructor
my_component:
class_name: "fully_qualified.class_name"
a_parameter: 3
another_parameter: "hello"
will result in
fully_qualified.class_name(a_parameter=3, another_parameter="hello")
If the component definition is a mapping with a "type" field, the factory will lookup the CLASS_TYPES_REGISTRY and replace the "type" field by "class_name" -> CLASS_TYPES_REGISTRY[type] and instantiate the object from the resulting mapping
If the component definition is a mapping with neither a "class_name" nor a "type" field, the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints. If the type hint is an interface present in [DEFAULT_IMPLEMENTATIONS_REGISTRY](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/default_implementation_registry.py, then the factory will create an object of its default implementation.
If the component definition is a list, then the factory will iterate over the elements of the list, instantiate its subcomponents, and return a list of instantiated objects.
If the component has subcomponents, the factory will create the subcomponents before instantiating the top level object
{
"type": TopLevel
"param":
{
"type": "ParamType"
"k": "v"
}
}
will result in
TopLevel(param=ParamType(k="v"))
More details on object instantiation can be found here.
$options
Parameters can be passed down from a parent component to its subcomponents using the $options key. This can be used to avoid repetitions.
Schema:
"$options":
type: object
additionalProperties: true
Example:
outer:
$options:
MyKey: MyValue
inner:
k2: v2
This the example above, if both outer and inner are types with a "MyKey" field, both of them will evaluate to "MyValue".
These parameters can be overwritten by subcomponents as a form of specialization:
outer:
$options:
MyKey: MyValue
inner:
$options:
MyKey: YourValue
k2: v2
In this example, "outer.MyKey" will evaluate to "MyValue", and "inner.MyKey" will evaluate to "YourValue".
The value can also be used for string interpolation:
outer:
$options:
MyKey: MyValue
inner:
k2: "MyKey is {{ options['MyKey'] }}"
In this example, outer.inner.k2 will evaluate to "MyKey is MyValue"
References
Strings can contain references to previously defined values. The parser will dereference these values to produce a complete object definition.
References can be defined using a "*ref({arg})" string.
key: 1234
reference: "*ref(key)"
will produce the following definition:
key: 1234
reference: 1234
This also works with objects:
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs: "*ref(key_value_pairs)"
will produce the following definition:
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
k1: v1
k2: v2
The $ref keyword can be used to refer to an object and enhance it with addition key-value pairs
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
$ref: "*ref(key_value_pairs)"
k3: v3
will produce the following definition:
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
k1: v1
k2: v2
k3: v3
References can also point to nested values.
Nested references are ambiguous because one could define a key containing with .
in this example, we want to refer to the limit key in the dict object:
dict:
limit: 50
limit_ref: "*ref(dict.limit)"
will produce the following definition:
dict
limit: 50
limit-ref: 50
whereas here we want to access the nested.path
value.
nested:
path: "first one"
nested.path: "uh oh"
value: "ref(nested.path)
will produce the following definition:
nested:
path: "first one"
nested.path: "uh oh"
value: "uh oh"
To resolve the ambiguity, we try looking for the reference key at the top-level, and then traverse the structs downward until we find a key with the given path, or until there is nothing to traverse.
More details on referencing values can be found here.
String interpolation
String values can be evaluated as Jinja2 templates.
If the input string is a raw string, the interpolated string will be the same.
"hello world" -> "hello world"
The engine will evaluate the content passed within {{...}}
, interpolating the keys from context-specific arguments.
The "options" keyword see ($options) can be referenced.
For example, some_object.inner_object.key will evaluate to "Hello airbyte" at runtime.
some_object:
$options:
name: "airbyte"
inner_object:
key: "Hello {{ options.name }}"
Some components also pass in additional arguments to the context.
This is the case for the record selector, which passes in an additional response
argument.
Both dot notation and bracket notations (with single quotes ( '
)) are interchangeable.
This means that both these string templates will evaluate to the same string:
"{{ options.name }}"
"{{ options['name'] }}"
In addition to passing additional values through the $options argument, macros can be called from within the string interpolation.
For example,
"{{ max(2, 3) }}" -> 3
The macros available can be found here.
Additional information on jinja templating can be found at https://jinja.palletsprojects.com/en/3.1.x/templates/#
Component schema reference
A JSON schema representation of the relationships between the components that can be used in the YAML configuration can be found here.
Custom components
Please help us improve the low code CDK! If you find yourself needing to build a custom component, please create a feature request issue. If appropriate, we'll add it directly to the framework (or you can submit a PR)!
Any built-in components can be overloaded by a custom Python class.
To create a custom component, define a new class in a new file in the connector's module.
The class must implement the interface of the component it is replacing. For instance, a pagination strategy must implement airbyte_cdk.sources.declarative.requesters.paginators.strategies.pagination_strategy.PaginationStrategy
.
The class must also be a dataclass where each field represents an argument to configure from the yaml file, and an InitVar
named options.
For example:
@dataclass
class MyPaginationStrategy(PaginationStrategy):
my_field: Union[InterpolatedString, str]
options: InitVar[Mapping[str, Any]]
def __post_init__(self, options: Mapping[str, Any]):
pass
def next_page_token(self, response: requests.Response, last_records: List[Mapping[str, Any]]) -> Optional[Any]:
pass
def reset(self):
pass
This class can then be referred from the yaml file using its fully qualified class name:
pagination_strategy:
class_name: "my_connector_module.MyPaginationStrategy"
my_field: "hello world"
How the framework works
- Given the connection config and an optional stream state, the
StreamSlicer
computes the stream slices to read. - Iterate over all the stream slices defined by the stream slicer.
- For each stream slice,
- Submit a request to the partner API as defined by the requester
- Select the records from the response
- Repeat for as long as the paginator points to a next page