airbyte_cdk

Welcome to the Airbyte Python CDK!

The Airbyte Python CDK is a Python library that provides a set of tools to help you build connectors for the Airbyte platform.

Building Source Connectors

To build a source connector, you will want to refer to the following classes and modules:

Building Destination Connectors

To build a destination connector, you will want to refer to the following classes and modules:

Working with Airbyte Protocol Models

The Airbyte CDK provides a set of classes that help you work with the Airbyte protocol models:

Using the CLI (`airbyte_cdk.cli`)

The Airbyte CDK provides two command-line interfaces (CLIs) for interacting with the framework.

airbyte-cdk: This is the main CLI for the Airbyte CDK. It provides commands for building and testing connectors, as well as other utilities. See the airbyte_cdk.cli.airbyte_cdk module for more details.
source-declarative-manifest: This command allows you to run declarative manifests directly. See the airbyte_cdk.cli.source_declarative_manifest module for more details.

API Reference

View Source

  1# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
  2"""
  3# Welcome to the Airbyte Python CDK!
  4
  5The Airbyte Python CDK is a Python library that provides a set of tools to help you build
  6connectors for the Airbyte platform.
  7
  8## Building Source Connectors
  9
 10To build a source connector, you will want to refer to
 11the following classes and modules:
 12
 13- `airbyte_cdk.sources`
 14- `airbyte_cdk.sources.concurrent_source`
 15- `airbyte_cdk.sources.config`
 16- `airbyte_cdk.sources.file_based`
 17- `airbyte_cdk.sources.streams`
 18
 19## Building Destination Connectors
 20
 21To build a destination connector, you will want to refer to
 22the following classes and modules:
 23
 24- `airbyte_cdk.destinations`
 25- `airbyte_cdk.destinations.Destination`
 26- `airbyte_cdk.destinations.vector_db_based`
 27
 28## Working with Airbyte Protocol Models
 29
 30The Airbyte CDK provides a set of classes that help you work with the Airbyte protocol models:
 31
 32- `airbyte_cdk.models.airbyte_protocol`
 33- `airbyte_cdk.models.airbyte_protocol_serializers`
 34
 35## Using the CLI (`airbyte_cdk.cli`)
 36
 37The Airbyte CDK provides two command-line interfaces (CLIs) for interacting with the framework.
 38
 39- `airbyte-cdk`: This is the main CLI for the Airbyte CDK. It provides commands for building
 40  and testing connectors, as well as other utilities. See the `airbyte_cdk.cli.airbyte_cdk` module
 41  for more details.
 42- `source-declarative-manifest`: This command allows you to run declarative manifests directly.
 43  See the `airbyte_cdk.cli.source_declarative_manifest` module for more details.
 44
 45---
 46
 47API Reference
 48
 49---
 50
 51"""
 52
 53# Warning: The below imports are not stable and will cause circular
 54# dependencies if auto-sorted with isort. Please keep them in the same order.
 55# TODO: Submodules should import from lower-level modules, rather than importing from here.
 56# Imports should also be placed in `if TYPE_CHECKING` blocks if they are only used as type
 57# hints - again, to avoid circular dependencies.
 58# Once those issues are resolved, the below can be sorted with isort.
 59import dunamai as _dunamai
 60
 61from .config_observation import (
 62    create_connector_config_control_message,
 63    emit_configuration_as_airbyte_control_message,
 64)
 65from .connector import BaseConnector, Connector
 66from .destinations import Destination
 67from .entrypoint import AirbyteEntrypoint, launch
 68from .logger import AirbyteLogFormatter, init_logger
 69from .models import (
 70    AdvancedAuth,
 71    AirbyteConnectionStatus,
 72    AirbyteLogMessage,
 73    AirbyteMessage,
 74    AirbyteRecordMessage,
 75    AirbyteStream,
 76    ConfiguredAirbyteCatalog,
 77    ConfiguredAirbyteStream,
 78    ConnectorSpecification,
 79    DestinationSyncMode,
 80    FailureType,
 81    Level,
 82    OAuthConfigSpecification,
 83    OrchestratorType,
 84    Status,
 85    SyncMode,
 86    Type,
 87)
 88from .sources import AbstractSource, Source
 89from .sources.concurrent_source.concurrent_source import ConcurrentSource
 90from .sources.concurrent_source.concurrent_source_adapter import ConcurrentSourceAdapter
 91from .sources.config import BaseConfig
 92from .sources.connector_state_manager import ConnectorStateManager
 93from .sources.declarative.auth import DeclarativeOauth2Authenticator
 94from .sources.declarative.auth.declarative_authenticator import DeclarativeAuthenticator, NoAuth
 95from .sources.declarative.auth.oauth import DeclarativeSingleUseRefreshTokenOauth2Authenticator
 96from .sources.declarative.auth.token import (
 97    ApiKeyAuthenticator,
 98    BasicHttpAuthenticator,
 99    BearerAuthenticator,
100)
101from .sources.declarative.datetime.min_max_datetime import MinMaxDatetime
102from .sources.declarative.declarative_stream import DeclarativeStream
103from .sources.declarative.decoders import Decoder, JsonDecoder
104from .sources.declarative.exceptions import ReadException
105from .sources.declarative.extractors import DpathExtractor, RecordSelector
106from .sources.declarative.extractors.record_extractor import RecordExtractor
107from .sources.declarative.extractors.record_filter import RecordFilter
108from .sources.declarative.incremental import DatetimeBasedCursor
109from .sources.declarative.interpolation import InterpolatedBoolean, InterpolatedString
110from .sources.declarative.manifest_declarative_source import ManifestDeclarativeSource
111from .sources.declarative.migrations.legacy_to_per_partition_state_migration import (
112    LegacyToPerPartitionStateMigration,
113)
114from .sources.declarative.partition_routers import (
115    CartesianProductStreamSlicer,
116    SinglePartitionRouter,
117    SubstreamPartitionRouter,
118)
119from .sources.declarative.partition_routers.substream_partition_router import ParentStreamConfig
120from .sources.declarative.requesters import HttpRequester, Requester
121from .sources.declarative.requesters.error_handlers import BackoffStrategy
122from .sources.declarative.requesters.paginators import DefaultPaginator, PaginationStrategy
123from .sources.declarative.requesters.paginators.strategies import (
124    CursorPaginationStrategy,
125    OffsetIncrement,
126    PageIncrement,
127    StopConditionPaginationStrategyDecorator,
128)
129from .sources.declarative.requesters.request_option import RequestOption, RequestOptionType
130from .sources.declarative.requesters.request_options.default_request_options_provider import (
131    DefaultRequestOptionsProvider,
132)
133from .sources.declarative.requesters.request_options.interpolated_request_input_provider import (
134    InterpolatedRequestInputProvider,
135)
136from .sources.declarative.requesters.requester import HttpMethod
137from .sources.declarative.retrievers import SimpleRetriever
138from .sources.declarative.schema import JsonFileSchemaLoader
139from .sources.declarative.transformations.add_fields import AddedFieldDefinition, AddFields
140from .sources.declarative.transformations.transformation import RecordTransformation
141from .sources.declarative.types import FieldPointer
142from .sources.declarative.yaml_declarative_source import YamlDeclarativeSource
143from .sources.message import InMemoryMessageRepository, MessageRepository
144from .sources.source import TState
145from .sources.streams.availability_strategy import AvailabilityStrategy
146from .sources.streams.call_rate import (
147    AbstractAPIBudget,
148    CachedLimiterSession,
149    HttpAPIBudget,
150    HttpRequestMatcher,
151    LimiterSession,
152    MovingWindowCallRatePolicy,
153    Rate,
154)
155from .sources.streams.checkpoint import Cursor as LegacyCursor
156from .sources.streams.checkpoint import ResumableFullRefreshCursor
157from .sources.streams.concurrent.adapters import StreamFacade
158from .sources.streams.concurrent.cursor import (
159    ConcurrentCursor,
160    Cursor,
161    CursorField,
162    FinalStateCursor,
163)
164from .sources.streams.concurrent.state_converters.datetime_stream_state_converter import (
165    EpochValueConcurrentStreamStateConverter,
166    IsoMillisConcurrentStreamStateConverter,
167)
168from .sources.streams.core import IncrementalMixin, Stream, package_name_from_class
169from .sources.streams.http import HttpStream, HttpSubStream
170from .sources.streams.http.availability_strategy import HttpAvailabilityStrategy
171from .sources.streams.http.exceptions import (
172    BaseBackoffException,
173    DefaultBackoffException,
174    UserDefinedBackoffException,
175)
176from .sources.streams.http.rate_limiting import default_backoff_handler
177from .sources.streams.http.requests_native_auth import (
178    Oauth2Authenticator,
179    SingleUseRefreshTokenOauth2Authenticator,
180    TokenAuthenticator,
181)
182from .sources.streams.http.requests_native_auth.abstract_token import AbstractHeaderAuthenticator
183from .sources.types import Config, Record, StreamSlice
184from .sources.utils import casing
185from .sources.utils.schema_helpers import (
186    InternalConfig,
187    ResourceSchemaLoader,
188    check_config_against_spec_or_exit,
189    expand_refs,
190    split_config,
191)
192from .sources.utils.transform import TransformConfig, TypeTransformer
193from .utils import AirbyteTracedException, is_cloud_environment
194from .utils.constants import ENV_REQUEST_CACHE_PATH
195from .utils.event_timing import create_timer
196from .utils.oneof_option_config import OneOfOptionConfig
197from .utils.spec_schema_transformations import resolve_refs
198from .utils.stream_status_utils import as_airbyte_message
199
200__all__ = [
201    # Availability strategy
202    "AvailabilityStrategy",
203    "HttpAvailabilityStrategy",
204    # Checkpoint
205    "LegacyCursor",
206    "ResumableFullRefreshCursor",
207    # Concurrent
208    "ConcurrentCursor",
209    "ConcurrentSource",
210    "ConcurrentSourceAdapter",
211    "Cursor",
212    "CursorField",
213    "DEFAULT_CONCURRENCY",
214    "EpochValueConcurrentStreamStateConverter",
215    "FinalStateCursor",
216    "IsoMillisConcurrentStreamStateConverter",
217    "StreamFacade",
218    # Config observation
219    "create_connector_config_control_message",
220    "emit_configuration_as_airbyte_control_message",
221    # Connector
222    "AbstractSource",
223    "BaseConfig",
224    "BaseConnector",
225    "Connector",
226    "Destination",
227    "Source",
228    "TState",
229    # Declarative
230    "AddFields",
231    "AddedFieldDefinition",
232    "ApiKeyAuthenticator",
233    "BackoffStrategy",
234    "BasicHttpAuthenticator",
235    "BearerAuthenticator",
236    "CartesianProductStreamSlicer",
237    "CursorPaginationStrategy",
238    "DatetimeBasedCursor",
239    "DeclarativeAuthenticator",
240    "DeclarativeOauth2Authenticator",
241    "DeclarativeSingleUseRefreshTokenOauth2Authenticator",
242    "DeclarativeStream",
243    "Decoder",
244    "DefaultPaginator",
245    "DefaultRequestOptionsProvider",
246    "DpathExtractor",
247    "FieldPointer",
248    "HttpMethod",
249    "HttpRequester",
250    "InterpolatedBoolean",
251    "InterpolatedRequestInputProvider",
252    "InterpolatedString",
253    "JsonDecoder",
254    "JsonFileSchemaLoader",
255    "LegacyToPerPartitionStateMigration",
256    "ManifestDeclarativeSource",
257    "MinMaxDatetime",
258    "NoAuth",
259    "OffsetIncrement",
260    "PageIncrement",
261    "PaginationStrategy",
262    "ParentStreamConfig",
263    "ReadException",
264    "RecordExtractor",
265    "RecordFilter",
266    "RecordSelector",
267    "RecordTransformation",
268    "RequestOption",
269    "RequestOptionType",
270    "Requester",
271    "ResponseStatus",
272    "SimpleRetriever",
273    "SinglePartitionRouter",
274    "StopConditionPaginationStrategyDecorator",
275    "StreamSlice",
276    "SubstreamPartitionRouter",
277    "YamlDeclarativeSource",
278    # Entrypoint
279    "launch",
280    "AirbyteEntrypoint",
281    # HTTP
282    "AbstractAPIBudget",
283    "AbstractHeaderAuthenticator",
284    "BaseBackoffException",
285    "CachedLimiterSession",
286    "DefaultBackoffException",
287    "default_backoff_handler",
288    "HttpAPIBudget",
289    "HttpAuthenticator",
290    "HttpRequestMatcher",
291    "HttpStream",
292    "HttpSubStream",
293    "LimiterSession",
294    "MovingWindowCallRatePolicy",
295    "MultipleTokenAuthenticator",
296    "Oauth2Authenticator",
297    "Rate",
298    "SingleUseRefreshTokenOauth2Authenticator",
299    "TokenAuthenticator",
300    "UserDefinedBackoffException",
301    # Logger
302    "AirbyteLogFormatter",
303    "init_logger",
304    # Protocol classes
305    "AirbyteStream",
306    "AirbyteConnectionStatus",
307    "AirbyteMessage",
308    "ConfiguredAirbyteCatalog",
309    "Status",
310    "Type",
311    "OrchestratorType",
312    "ConfiguredAirbyteStream",
313    "DestinationSyncMode",
314    "SyncMode",
315    "FailureType",
316    "AdvancedAuth",
317    "AirbyteLogMessage",
318    "OAuthConfigSpecification",
319    "ConnectorSpecification",
320    "Level",
321    "AirbyteRecordMessage",
322    # Repository
323    "InMemoryMessageRepository",
324    "MessageRepository",
325    # State management
326    "ConnectorStateManager",
327    # Stream
328    "IncrementalMixin",
329    "Stream",
330    "StreamData",
331    "package_name_from_class",
332    # Utils
333    "AirbyteTracedException",
334    "is_cloud_environment",
335    "casing",
336    "InternalConfig",
337    "ResourceSchemaLoader",
338    "check_config_against_spec_or_exit",
339    "split_config",
340    "TransformConfig",
341    "TypeTransformer",
342    "ENV_REQUEST_CACHE_PATH",
343    "create_timer",
344    "OneOfOptionConfig",
345    "resolve_refs",
346    "as_airbyte_message",
347    # Types
348    "Config",
349    "Record",
350    "Source",
351    "StreamSlice",
352]
353
354__version__: str
355"""Version generated by poetry dynamic versioning during publish.
356
357When running in development, dunamai will calculate a new prerelease version
358from existing git release tag info.
359"""
360
361try:
362    __version__ = _dunamai.get_version(
363        "airbyte-cdk",
364        third_choice=_dunamai.Version.from_any_vcs,
365        fallback=_dunamai.Version("0.0.0+dev"),
366    ).serialize()
367except:
368    __version__ = "0.0.0+dev"

class AvailabilityStrategy(abc.ABC): View Source

19class AvailabilityStrategy(ABC):
20    """
21    Abstract base class for checking stream availability.
22    """
23
24    @abstractmethod
25    def check_availability(
26        self, stream: Stream, logger: logging.Logger, source: Optional["Source"] = None
27    ) -> Tuple[bool, Optional[str]]:
28        """
29        Checks stream availability.
30
31        :param stream: stream
32        :param logger: source logger
33        :param source: (optional) source
34        :return: A tuple of (boolean, str). If boolean is true, then the stream
35          is available, and no str is required. Otherwise, the stream is unavailable
36          for some reason and the str should describe what went wrong and how to
37          resolve the unavailability, if possible.
38        """
39
40    @staticmethod
41    def get_first_stream_slice(stream: Stream) -> Optional[Mapping[str, Any]]:
42        """
43        Gets the first stream_slice from a given stream's stream_slices.
44        :param stream: stream
45        :raises StopIteration: if there is no first slice to return (the stream_slices generator is empty)
46        :return: first stream slice from 'stream_slices' generator (`None` is a valid stream slice)
47        """
48        # We wrap the return output of stream_slices() because some implementations return types that are iterable,
49        # but not iterators such as lists or tuples
50        slices = iter(
51            stream.stream_slices(
52                cursor_field=stream.cursor_field,  # type: ignore[arg-type]
53                sync_mode=SyncMode.full_refresh,
54            )
55        )
56        return next(slices)
57
58    @staticmethod
59    def get_first_record_for_slice(
60        stream: Stream, stream_slice: Optional[Mapping[str, Any]]
61    ) -> StreamData:
62        """
63        Gets the first record for a stream_slice of a stream.
64
65        :param stream: stream instance from which to read records
66        :param stream_slice: stream_slice parameters for slicing the stream
67        :raises StopIteration: if there is no first record to return (the read_records generator is empty)
68        :return: StreamData containing the first record in the slice
69        """
70        # Store the original value of exit_on_rate_limit
71        original_exit_on_rate_limit = stream.exit_on_rate_limit
72
73        try:
74            # Ensure exit_on_rate_limit is safely set to True if possible
75            stream.exit_on_rate_limit = True
76
77            # We wrap the return output of read_records() because some implementations return types that are iterable,
78            # but not iterators such as lists or tuples
79            records_for_slice = iter(
80                stream.read_records(sync_mode=SyncMode.full_refresh, stream_slice=stream_slice)
81            )
82
83            return next(records_for_slice)
84        finally:
85            # Restore the original exit_on_rate_limit value
86            stream.exit_on_rate_limit = original_exit_on_rate_limit

Abstract base class for checking stream availability.

@abstractmethod

def check_availability( self, stream: Stream, logger: logging.Logger, source: Optional[Source] = None) -> Tuple[bool, Optional[str]]: View Source

24    @abstractmethod
25    def check_availability(
26        self, stream: Stream, logger: logging.Logger, source: Optional["Source"] = None
27    ) -> Tuple[bool, Optional[str]]:
28        """
29        Checks stream availability.
30
31        :param stream: stream
32        :param logger: source logger
33        :param source: (optional) source
34        :return: A tuple of (boolean, str). If boolean is true, then the stream
35          is available, and no str is required. Otherwise, the stream is unavailable
36          for some reason and the str should describe what went wrong and how to
37          resolve the unavailability, if possible.
38        """

Checks stream availability.

Parameters

stream: stream
logger: source logger
source: (optional) source

Returns

A tuple of (boolean, str). If boolean is true, then the stream is available, and no str is required. Otherwise, the stream is unavailable for some reason and the str should describe what went wrong and how to resolve the unavailability, if possible.

@staticmethod

def get_first_stream_slice( stream: Stream) -> Optional[Mapping[str, Any]]: View Source

40    @staticmethod
41    def get_first_stream_slice(stream: Stream) -> Optional[Mapping[str, Any]]:
42        """
43        Gets the first stream_slice from a given stream's stream_slices.
44        :param stream: stream
45        :raises StopIteration: if there is no first slice to return (the stream_slices generator is empty)
46        :return: first stream slice from 'stream_slices' generator (`None` is a valid stream slice)
47        """
48        # We wrap the return output of stream_slices() because some implementations return types that are iterable,
49        # but not iterators such as lists or tuples
50        slices = iter(
51            stream.stream_slices(
52                cursor_field=stream.cursor_field,  # type: ignore[arg-type]
53                sync_mode=SyncMode.full_refresh,
54            )
55        )
56        return next(slices)

Gets the first stream_slice from a given stream's stream_slices.

Parameters

stream: stream

Raises

StopIteration: if there is no first slice to return (the stream_slices generator is empty)

Returns

first stream slice from 'stream_slices' generator (None is a valid stream slice)

@staticmethod

def get_first_record_for_slice( stream: Stream, stream_slice: Optional[Mapping[str, Any]]) -> Union[Mapping[str, Any], AirbyteMessage]: View Source

58    @staticmethod
59    def get_first_record_for_slice(
60        stream: Stream, stream_slice: Optional[Mapping[str, Any]]
61    ) -> StreamData:
62        """
63        Gets the first record for a stream_slice of a stream.
64
65        :param stream: stream instance from which to read records
66        :param stream_slice: stream_slice parameters for slicing the stream
67        :raises StopIteration: if there is no first record to return (the read_records generator is empty)
68        :return: StreamData containing the first record in the slice
69        """
70        # Store the original value of exit_on_rate_limit
71        original_exit_on_rate_limit = stream.exit_on_rate_limit
72
73        try:
74            # Ensure exit_on_rate_limit is safely set to True if possible
75            stream.exit_on_rate_limit = True
76
77            # We wrap the return output of read_records() because some implementations return types that are iterable,
78            # but not iterators such as lists or tuples
79            records_for_slice = iter(
80                stream.read_records(sync_mode=SyncMode.full_refresh, stream_slice=stream_slice)
81            )
82
83            return next(records_for_slice)
84        finally:
85            # Restore the original exit_on_rate_limit value
86            stream.exit_on_rate_limit = original_exit_on_rate_limit

Gets the first record for a stream_slice of a stream.

Parameters

stream: stream instance from which to read records
stream_slice: stream_slice parameters for slicing the stream

Raises

StopIteration: if there is no first record to return (the read_records generator is empty)

Returns

StreamData containing the first record in the slice

class HttpAvailabilityStrategy(airbyte_cdk.AvailabilityStrategy): View Source

18class HttpAvailabilityStrategy(AvailabilityStrategy):
19    def check_availability(
20        self, stream: Stream, logger: logging.Logger, source: Optional["Source"] = None
21    ) -> Tuple[bool, Optional[str]]:
22        """
23        Check stream availability by attempting to read the first record of the
24        stream.
25
26        :param stream: stream
27        :param logger: source logger
28        :param source: (optional) source
29        :return: A tuple of (boolean, str). If boolean is true, then the stream
30          is available, and no str is required. Otherwise, the stream is unavailable
31          for some reason and the str should describe what went wrong and how to
32          resolve the unavailability, if possible.
33        """
34        reason: Optional[str]
35        try:
36            # Some streams need a stream slice to read records (e.g. if they have a SubstreamPartitionRouter)
37            # Streams that don't need a stream slice will return `None` as their first stream slice.
38            stream_slice = self.get_first_stream_slice(stream)
39        except StopIteration:
40            # If stream_slices has no `next()` item (Note - this is different from stream_slices returning [None]!)
41            # This can happen when a substream's `stream_slices` method does a `for record in parent_records: yield <something>`
42            # without accounting for the case in which the parent stream is empty.
43            reason = f"Cannot attempt to connect to stream {stream.name} - no stream slices were found, likely because the parent stream is empty."
44            return False, reason
45        except AirbyteTracedException as error:
46            return False, error.message
47
48        try:
49            self.get_first_record_for_slice(stream, stream_slice)
50            return True, None
51        except StopIteration:
52            logger.info(f"Successfully connected to stream {stream.name}, but got 0 records.")
53            return True, None
54        except AirbyteTracedException as error:
55            return False, error.message

Abstract base class for checking stream availability.

def check_availability( self, stream: Stream, logger: logging.Logger, source: Optional[Source] = None) -> Tuple[bool, Optional[str]]: View Source

19    def check_availability(
20        self, stream: Stream, logger: logging.Logger, source: Optional["Source"] = None
21    ) -> Tuple[bool, Optional[str]]:
22        """
23        Check stream availability by attempting to read the first record of the
24        stream.
25
26        :param stream: stream
27        :param logger: source logger
28        :param source: (optional) source
29        :return: A tuple of (boolean, str). If boolean is true, then the stream
30          is available, and no str is required. Otherwise, the stream is unavailable
31          for some reason and the str should describe what went wrong and how to
32          resolve the unavailability, if possible.
33        """
34        reason: Optional[str]
35        try:
36            # Some streams need a stream slice to read records (e.g. if they have a SubstreamPartitionRouter)
37            # Streams that don't need a stream slice will return `None` as their first stream slice.
38            stream_slice = self.get_first_stream_slice(stream)
39        except StopIteration:
40            # If stream_slices has no `next()` item (Note - this is different from stream_slices returning [None]!)
41            # This can happen when a substream's `stream_slices` method does a `for record in parent_records: yield <something>`
42            # without accounting for the case in which the parent stream is empty.
43            reason = f"Cannot attempt to connect to stream {stream.name} - no stream slices were found, likely because the parent stream is empty."
44            return False, reason
45        except AirbyteTracedException as error:
46            return False, error.message
47
48        try:
49            self.get_first_record_for_slice(stream, stream_slice)
50            return True, None
51        except StopIteration:
52            logger.info(f"Successfully connected to stream {stream.name}, but got 0 records.")
53            return True, None
54        except AirbyteTracedException as error:
55            return False, error.message

Check stream availability by attempting to read the first record of the stream.

Parameters

stream: stream
logger: source logger
source: (optional) source

Returns

A tuple of (boolean, str). If boolean is true, then the stream is available, and no str is required. Otherwise, the stream is unavailable for some reason and the str should describe what went wrong and how to resolve the unavailability, if possible.

Inherited Members

AvailabilityStrategy: get_first_stream_slice; get_first_record_for_slice

LegacyCursor = <class 'airbyte_cdk.sources.streams.checkpoint.Cursor'>

@dataclass

class ResumableFullRefreshCursor(airbyte_cdk.sources.streams.checkpoint.cursor.Cursor): View Source

11@dataclass
12class ResumableFullRefreshCursor(Cursor):
13    """
14    Cursor that allows for the checkpointing of sync progress according to a synthetic cursor based on the pagination state
15    of the stream. Resumable full refresh syncs are only intended to retain state in between sync attempts of the same job
16    with the platform responsible for removing said state.
17    """
18
19    def __init__(self) -> None:
20        self._cursor: StreamState = {}
21
22    def get_stream_state(self) -> StreamState:
23        return self._cursor
24
25    def set_initial_state(self, stream_state: StreamState) -> None:
26        self._cursor = stream_state
27
28    def observe(self, stream_slice: StreamSlice, record: Record) -> None:
29        """
30        Resumable full refresh manages state using a page number so it does not need to update state by observing incoming records.
31        """
32        pass
33
34    def close_slice(self, stream_slice: StreamSlice, *args: Any) -> None:
35        self._cursor = stream_slice.cursor_slice
36
37    def should_be_synced(self, record: Record) -> bool:
38        """
39        Unlike date-based cursors which filter out records outside slice boundaries, resumable full refresh records exist within pages
40        that don't have filterable bounds. We should always return them.
41        """
42        return True
43
44    def select_state(self, stream_slice: Optional[StreamSlice] = None) -> Optional[StreamState]:
45        # A top-level RFR cursor only manages the state of a single partition
46        return self._cursor

Cursor that allows for the checkpointing of sync progress according to a synthetic cursor based on the pagination state of the stream. Resumable full refresh syncs are only intended to retain state in between sync attempts of the same job with the platform responsible for removing said state.

def get_stream_state(self) -> Mapping[str, Any]: View Source

22    def get_stream_state(self) -> StreamState:
23        return self._cursor

Returns the current stream state. We would like to restrict it's usage since it does expose internal of state. As of 2023-06-14, it is used for two things:

Interpolation of the requests
Transformation of records
Saving the state

For the first case, we are probably stuck with exposing the stream state. For the second, we can probably expose a method that allows for emitting the state to the platform.

def set_initial_state(self, stream_state: Mapping[str, Any]) -> None: View Source

25    def set_initial_state(self, stream_state: StreamState) -> None:
26        self._cursor = stream_state

Cursors are not initialized with their state. As state is needed in order to function properly, this method should be called before calling anything else

Parameters

stream_state: The state of the stream as returned by get_stream_state

def observe( self, stream_slice: StreamSlice, record: Record) -> None: View Source

28    def observe(self, stream_slice: StreamSlice, record: Record) -> None:
29        """
30        Resumable full refresh manages state using a page number so it does not need to update state by observing incoming records.
31        """
32        pass

Resumable full refresh manages state using a page number so it does not need to update state by observing incoming records.

def close_slice( self, stream_slice: StreamSlice, *args: Any) -> None: View Source

34    def close_slice(self, stream_slice: StreamSlice, *args: Any) -> None:
35        self._cursor = stream_slice.cursor_slice

Update state based on the stream slice. Note that stream_slice.cursor_slice and most_recent_record.associated_slice are expected to be the same but we make it explicit here that stream_slice should be leveraged to update the state. We do not pass in the latest record, since cursor instances should maintain the relevant internal state on their own.

Parameters

stream_slice: slice to close

def should_be_synced(self, record: Record) -> bool: View Source

37    def should_be_synced(self, record: Record) -> bool:
38        """
39        Unlike date-based cursors which filter out records outside slice boundaries, resumable full refresh records exist within pages
40        that don't have filterable bounds. We should always return them.
41        """
42        return True

Unlike date-based cursors which filter out records outside slice boundaries, resumable full refresh records exist within pages that don't have filterable bounds. We should always return them.

def select_state( self, stream_slice: Optional[StreamSlice] = None) -> Optional[Mapping[str, Any]]: View Source

44    def select_state(self, stream_slice: Optional[StreamSlice] = None) -> Optional[StreamState]:
45        # A top-level RFR cursor only manages the state of a single partition
46        return self._cursor

Get the state value of a specific stream_slice. For incremental or resumable full refresh cursors which only manage state in a single dimension this is the entire state object. For per-partition cursors used by substreams, this returns the state of a specific parent delineated by the incoming slice's partition object.

class ConcurrentCursor(airbyte_cdk.Cursor): View Source

135class ConcurrentCursor(Cursor):
136    _START_BOUNDARY = 0
137    _END_BOUNDARY = 1
138
139    def __init__(
140        self,
141        stream_name: str,
142        stream_namespace: Optional[str],
143        stream_state: Any,
144        message_repository: MessageRepository,
145        connector_state_manager: ConnectorStateManager,
146        connector_state_converter: AbstractStreamStateConverter,
147        cursor_field: CursorField,
148        slice_boundary_fields: Optional[Tuple[str, str]],
149        start: Optional[CursorValueType],
150        end_provider: Callable[[], CursorValueType],
151        lookback_window: Optional[GapType] = None,
152        slice_range: Optional[GapType] = None,
153        cursor_granularity: Optional[GapType] = None,
154        clamping_strategy: ClampingStrategy = NoClamping(),
155    ) -> None:
156        self._stream_name = stream_name
157        self._stream_namespace = stream_namespace
158        self._message_repository = message_repository
159        self._connector_state_converter = connector_state_converter
160        self._connector_state_manager = connector_state_manager
161        self._cursor_field = cursor_field
162        # To see some example where the slice boundaries might not be defined, check https://github.com/airbytehq/airbyte/blob/1ce84d6396e446e1ac2377362446e3fb94509461/airbyte-integrations/connectors/source-stripe/source_stripe/streams.py#L363-L379
163        self._slice_boundary_fields = slice_boundary_fields
164        self._start = start
165        self._end_provider = end_provider
166        self.start, self._concurrent_state = self._get_concurrent_state(stream_state)
167        self._lookback_window = lookback_window
168        self._slice_range = slice_range
169        self._most_recent_cursor_value_per_partition: MutableMapping[
170            Union[StreamSlice, Mapping[str, Any], None], Any
171        ] = {}
172        self._has_closed_at_least_one_slice = False
173        self._cursor_granularity = cursor_granularity
174        # Flag to track if the logger has been triggered (per stream)
175        self._should_be_synced_logger_triggered = False
176        self._clamping_strategy = clamping_strategy
177
178    @property
179    def state(self) -> MutableMapping[str, Any]:
180        return self._connector_state_converter.convert_to_state_message(
181            self.cursor_field, self._concurrent_state
182        )
183
184    @property
185    def cursor_field(self) -> CursorField:
186        return self._cursor_field
187
188    @property
189    def _slice_boundary_fields_wrapper(self) -> Tuple[str, str]:
190        return (
191            self._slice_boundary_fields
192            if self._slice_boundary_fields
193            else (
194                self._connector_state_converter.START_KEY,
195                self._connector_state_converter.END_KEY,
196            )
197        )
198
199    def _get_concurrent_state(
200        self, state: MutableMapping[str, Any]
201    ) -> Tuple[CursorValueType, MutableMapping[str, Any]]:
202        if self._connector_state_converter.is_state_message_compatible(state):
203            partitioned_state = self._connector_state_converter.deserialize(state)
204            slices_from_partitioned_state = partitioned_state.get("slices", [])
205
206            value_from_partitioned_state = None
207            if slices_from_partitioned_state:
208                # We assume here that the slices have been already merged
209                first_slice = slices_from_partitioned_state[0]
210                value_from_partitioned_state = (
211                    first_slice[self._connector_state_converter.MOST_RECENT_RECORD_KEY]
212                    if self._connector_state_converter.MOST_RECENT_RECORD_KEY in first_slice
213                    else first_slice[self._connector_state_converter.END_KEY]
214                )
215            return (
216                value_from_partitioned_state
217                or self._start
218                or self._connector_state_converter.zero_value,
219                partitioned_state,
220            )
221        return self._connector_state_converter.convert_from_sequential_state(
222            self._cursor_field, state, self._start
223        )
224
225    def observe(self, record: Record) -> None:
226        most_recent_cursor_value = self._most_recent_cursor_value_per_partition.get(
227            record.associated_slice
228        )
229        try:
230            cursor_value = self._extract_cursor_value(record)
231
232            if most_recent_cursor_value is None or most_recent_cursor_value < cursor_value:
233                self._most_recent_cursor_value_per_partition[record.associated_slice] = cursor_value
234        except ValueError:
235            self._log_for_record_without_cursor_value()
236
237    def _extract_cursor_value(self, record: Record) -> Any:
238        return self._connector_state_converter.parse_value(self._cursor_field.extract_value(record))
239
240    def close_partition(self, partition: Partition) -> None:
241        slice_count_before = len(self._concurrent_state.get("slices", []))
242        self._add_slice_to_state(partition)
243        if slice_count_before < len(
244            self._concurrent_state["slices"]
245        ):  # only emit if at least one slice has been processed
246            self._merge_partitions()
247            self._emit_state_message()
248        self._has_closed_at_least_one_slice = True
249
250    def _add_slice_to_state(self, partition: Partition) -> None:
251        most_recent_cursor_value = self._most_recent_cursor_value_per_partition.get(
252            partition.to_slice()
253        )
254
255        if self._slice_boundary_fields:
256            if "slices" not in self._concurrent_state:
257                raise RuntimeError(
258                    f"The state for stream {self._stream_name} should have at least one slice to delineate the sync start time, but no slices are present. This is unexpected. Please contact Support."
259                )
260            self._concurrent_state["slices"].append(
261                {
262                    self._connector_state_converter.START_KEY: self._extract_from_slice(
263                        partition, self._slice_boundary_fields[self._START_BOUNDARY]
264                    ),
265                    self._connector_state_converter.END_KEY: self._extract_from_slice(
266                        partition, self._slice_boundary_fields[self._END_BOUNDARY]
267                    ),
268                    self._connector_state_converter.MOST_RECENT_RECORD_KEY: most_recent_cursor_value,
269                }
270            )
271        elif most_recent_cursor_value:
272            if self._has_closed_at_least_one_slice:
273                # If we track state value using records cursor field, we can only do that if there is one partition. This is because we save
274                # the state every time we close a partition. We assume that if there are multiple slices, they need to be providing
275                # boundaries. There are cases where partitions could not have boundaries:
276                # * The cursor should be per-partition
277                # * The stream state is actually the parent stream state
278                # There might be other cases not listed above. Those are not supported today hence the stream should not use this cursor for
279                # state management. For the specific user that was affected with this issue, we need to:
280                # * Fix state tracking (which is currently broken)
281                # * Make the new version available
282                # * (Probably) ask the user to reset the stream to avoid data loss
283                raise ValueError(
284                    "Given that slice_boundary_fields is not defined and that per-partition state is not supported, only one slice is "
285                    "expected. Please contact the Airbyte team."
286                )
287
288            self._concurrent_state["slices"].append(
289                {
290                    self._connector_state_converter.START_KEY: self.start,
291                    self._connector_state_converter.END_KEY: most_recent_cursor_value,
292                    self._connector_state_converter.MOST_RECENT_RECORD_KEY: most_recent_cursor_value,
293                }
294            )
295
296    def _emit_state_message(self) -> None:
297        self._connector_state_manager.update_state_for_stream(
298            self._stream_name,
299            self._stream_namespace,
300            self.state,
301        )
302        state_message = self._connector_state_manager.create_state_message(
303            self._stream_name, self._stream_namespace
304        )
305        self._message_repository.emit_message(state_message)
306
307    def _merge_partitions(self) -> None:
308        self._concurrent_state["slices"] = self._connector_state_converter.merge_intervals(
309            self._concurrent_state["slices"]
310        )
311
312    def _extract_from_slice(self, partition: Partition, key: str) -> CursorValueType:
313        try:
314            _slice = partition.to_slice()
315            if not _slice:
316                raise KeyError(f"Could not find key `{key}` in empty slice")
317            return self._connector_state_converter.parse_value(_slice[key])  # type: ignore  # we expect the devs to specify a key that would return a CursorValueType
318        except KeyError as exception:
319            raise KeyError(
320                f"Partition is expected to have key `{key}` but could not be found"
321            ) from exception
322
323    def ensure_at_least_one_state_emitted(self) -> None:
324        """
325        The platform expect to have at least one state message on successful syncs. Hence, whatever happens, we expect this method to be
326        called.
327        """
328        self._emit_state_message()
329
330    def stream_slices(self) -> Iterable[StreamSlice]:
331        """
332        Generating slices based on a few parameters:
333        * lookback_window: Buffer to remove from END_KEY of the highest slice
334        * slice_range: Max difference between two slices. If the difference between two slices is greater, multiple slices will be created
335        * start: `_split_per_slice_range` will clip any value to `self._start which means that:
336          * if upper is less than self._start, no slices will be generated
337          * if lower is less than self._start, self._start will be used as the lower boundary (lookback_window will not be considered in that case)
338
339        Note that the slices will overlap at their boundaries. We therefore expect to have at least the lower or the upper boundary to be
340        inclusive in the API that is queried.
341        """
342        self._merge_partitions()
343
344        if self._start is not None and self._is_start_before_first_slice():
345            yield from self._split_per_slice_range(
346                self._start,
347                self._concurrent_state["slices"][0][self._connector_state_converter.START_KEY],
348                False,
349            )
350
351        if len(self._concurrent_state["slices"]) == 1:
352            yield from self._split_per_slice_range(
353                self._calculate_lower_boundary_of_last_slice(
354                    self._concurrent_state["slices"][0][self._connector_state_converter.END_KEY]
355                ),
356                self._end_provider(),
357                True,
358            )
359        elif len(self._concurrent_state["slices"]) > 1:
360            for i in range(len(self._concurrent_state["slices"]) - 1):
361                if self._cursor_granularity:
362                    yield from self._split_per_slice_range(
363                        self._concurrent_state["slices"][i][self._connector_state_converter.END_KEY]
364                        + self._cursor_granularity,
365                        self._concurrent_state["slices"][i + 1][
366                            self._connector_state_converter.START_KEY
367                        ],
368                        False,
369                    )
370                else:
371                    yield from self._split_per_slice_range(
372                        self._concurrent_state["slices"][i][
373                            self._connector_state_converter.END_KEY
374                        ],
375                        self._concurrent_state["slices"][i + 1][
376                            self._connector_state_converter.START_KEY
377                        ],
378                        False,
379                    )
380            yield from self._split_per_slice_range(
381                self._calculate_lower_boundary_of_last_slice(
382                    self._concurrent_state["slices"][-1][self._connector_state_converter.END_KEY]
383                ),
384                self._end_provider(),
385                True,
386            )
387        else:
388            raise ValueError("Expected at least one slice")
389
390    def _is_start_before_first_slice(self) -> bool:
391        return (
392            self._start is not None
393            and self._start
394            < self._concurrent_state["slices"][0][self._connector_state_converter.START_KEY]
395        )
396
397    def _calculate_lower_boundary_of_last_slice(
398        self, lower_boundary: CursorValueType
399    ) -> CursorValueType:
400        if self._lookback_window:
401            return lower_boundary - self._lookback_window
402        return lower_boundary
403
404    def _split_per_slice_range(
405        self, lower: CursorValueType, upper: CursorValueType, upper_is_end: bool
406    ) -> Iterable[StreamSlice]:
407        if lower >= upper:
408            return
409
410        if self._start and upper < self._start:
411            return
412
413        lower = max(lower, self._start) if self._start else lower
414        if not self._slice_range or self._evaluate_upper_safely(lower, self._slice_range) >= upper:
415            clamped_lower = self._clamping_strategy.clamp(lower)
416            clamped_upper = self._clamping_strategy.clamp(upper)
417            start_value, end_value = (
418                (clamped_lower, clamped_upper - self._cursor_granularity)
419                if self._cursor_granularity and not upper_is_end
420                else (clamped_lower, clamped_upper)
421            )
422            yield StreamSlice(
423                partition={},
424                cursor_slice={
425                    self._slice_boundary_fields_wrapper[
426                        self._START_BOUNDARY
427                    ]: self._connector_state_converter.output_format(start_value),
428                    self._slice_boundary_fields_wrapper[
429                        self._END_BOUNDARY
430                    ]: self._connector_state_converter.output_format(end_value),
431                },
432            )
433        else:
434            stop_processing = False
435            current_lower_boundary = lower
436            while not stop_processing:
437                current_upper_boundary = min(
438                    self._evaluate_upper_safely(current_lower_boundary, self._slice_range), upper
439                )
440                has_reached_upper_boundary = current_upper_boundary >= upper
441
442                clamped_upper = (
443                    self._clamping_strategy.clamp(current_upper_boundary)
444                    if current_upper_boundary != upper
445                    else current_upper_boundary
446                )
447                clamped_lower = self._clamping_strategy.clamp(current_lower_boundary)
448                if clamped_lower >= clamped_upper:
449                    # clamping collapsed both values which means that it is time to stop processing
450                    # FIXME should this be replace by proper end_provider
451                    break
452                start_value, end_value = (
453                    (clamped_lower, clamped_upper - self._cursor_granularity)
454                    if self._cursor_granularity
455                    and (not upper_is_end or not has_reached_upper_boundary)
456                    else (clamped_lower, clamped_upper)
457                )
458                yield StreamSlice(
459                    partition={},
460                    cursor_slice={
461                        self._slice_boundary_fields_wrapper[
462                            self._START_BOUNDARY
463                        ]: self._connector_state_converter.output_format(start_value),
464                        self._slice_boundary_fields_wrapper[
465                            self._END_BOUNDARY
466                        ]: self._connector_state_converter.output_format(end_value),
467                    },
468                )
469                current_lower_boundary = clamped_upper
470                if current_upper_boundary >= upper:
471                    stop_processing = True
472
473    def _evaluate_upper_safely(self, lower: CursorValueType, step: GapType) -> CursorValueType:
474        """
475        Given that we set the default step at datetime.timedelta.max, we will generate an OverflowError when evaluating the next start_date
476        This method assumes that users would never enter a step that would generate an overflow. Given that would be the case, the code
477        would have broken anyway.
478        """
479        try:
480            return lower + step
481        except OverflowError:
482            return self._end_provider()
483
484    def should_be_synced(self, record: Record) -> bool:
485        """
486        Determines if a record should be synced based on its cursor value.
487        :param record: The record to evaluate
488
489        :return: True if the record's cursor value falls within the sync boundaries
490        """
491        try:
492            record_cursor_value: CursorValueType = self._extract_cursor_value(record)
493        except ValueError:
494            self._log_for_record_without_cursor_value()
495            return True
496        return self.start <= record_cursor_value <= self._end_provider()
497
498    def _log_for_record_without_cursor_value(self) -> None:
499        if not self._should_be_synced_logger_triggered:
500            LOGGER.warning(
501                f"Could not find cursor field `{self.cursor_field.cursor_field_key}` in record for stream {self._stream_name}. The incremental sync will assume it needs to be synced"
502            )
503            self._should_be_synced_logger_triggered = True

Slices the stream into chunks that can be fetched independently. Slices enable state checkpointing and data retrieval parallelization.

139    def __init__(
140        self,
141        stream_name: str,
142        stream_namespace: Optional[str],
143        stream_state: Any,
144        message_repository: MessageRepository,
145        connector_state_manager: ConnectorStateManager,
146        connector_state_converter: AbstractStreamStateConverter,
147        cursor_field: CursorField,
148        slice_boundary_fields: Optional[Tuple[str, str]],
149        start: Optional[CursorValueType],
150        end_provider: Callable[[], CursorValueType],
151        lookback_window: Optional[GapType] = None,
152        slice_range: Optional[GapType] = None,
153        cursor_granularity: Optional[GapType] = None,
154        clamping_strategy: ClampingStrategy = NoClamping(),
155    ) -> None:
156        self._stream_name = stream_name
157        self._stream_namespace = stream_namespace
158        self._message_repository = message_repository
159        self._connector_state_converter = connector_state_converter
160        self._connector_state_manager = connector_state_manager
161        self._cursor_field = cursor_field
162        # To see some example where the slice boundaries might not be defined, check https://github.com/airbytehq/airbyte/blob/1ce84d6396e446e1ac2377362446e3fb94509461/airbyte-integrations/connectors/source-stripe/source_stripe/streams.py#L363-L379
163        self._slice_boundary_fields = slice_boundary_fields
164        self._start = start
165        self._end_provider = end_provider
166        self.start, self._concurrent_state = self._get_concurrent_state(stream_state)
167        self._lookback_window = lookback_window
168        self._slice_range = slice_range
169        self._most_recent_cursor_value_per_partition: MutableMapping[
170            Union[StreamSlice, Mapping[str, Any], None], Any
171        ] = {}
172        self._has_closed_at_least_one_slice = False
173        self._cursor_granularity = cursor_granularity
174        # Flag to track if the logger has been triggered (per stream)
175        self._should_be_synced_logger_triggered = False
176        self._clamping_strategy = clamping_strategy

state: MutableMapping[str, Any] View Source

178    @property
179    def state(self) -> MutableMapping[str, Any]:
180        return self._connector_state_converter.convert_to_state_message(
181            self.cursor_field, self._concurrent_state
182        )

cursor_field: CursorField View Source

184    @property
185    def cursor_field(self) -> CursorField:
186        return self._cursor_field

def observe(self, record: Record) -> None: View Source

225    def observe(self, record: Record) -> None:
226        most_recent_cursor_value = self._most_recent_cursor_value_per_partition.get(
227            record.associated_slice
228        )
229        try:
230            cursor_value = self._extract_cursor_value(record)
231
232            if most_recent_cursor_value is None or most_recent_cursor_value < cursor_value:
233                self._most_recent_cursor_value_per_partition[record.associated_slice] = cursor_value
234        except ValueError:
235            self._log_for_record_without_cursor_value()

Indicate to the cursor that the record has been emitted

def close_partition( self, partition: airbyte_cdk.sources.streams.concurrent.partitions.partition.Partition) -> None: View Source

240    def close_partition(self, partition: Partition) -> None:
241        slice_count_before = len(self._concurrent_state.get("slices", []))
242        self._add_slice_to_state(partition)
243        if slice_count_before < len(
244            self._concurrent_state["slices"]
245        ):  # only emit if at least one slice has been processed
246            self._merge_partitions()
247            self._emit_state_message()
248        self._has_closed_at_least_one_slice = True

Indicate to the cursor that the partition has been successfully processed

def ensure_at_least_one_state_emitted(self) -> None: View Source

323    def ensure_at_least_one_state_emitted(self) -> None:
324        """
325        The platform expect to have at least one state message on successful syncs. Hence, whatever happens, we expect this method to be
326        called.
327        """
328        self._emit_state_message()

The platform expect to have at least one state message on successful syncs. Hence, whatever happens, we expect this method to be called.

def stream_slices(self) -> Iterable[StreamSlice]: View Source

330    def stream_slices(self) -> Iterable[StreamSlice]:
331        """
332        Generating slices based on a few parameters:
333        * lookback_window: Buffer to remove from END_KEY of the highest slice
334        * slice_range: Max difference between two slices. If the difference between two slices is greater, multiple slices will be created
335        * start: `_split_per_slice_range` will clip any value to `self._start which means that:
336          * if upper is less than self._start, no slices will be generated
337          * if lower is less than self._start, self._start will be used as the lower boundary (lookback_window will not be considered in that case)
338
339        Note that the slices will overlap at their boundaries. We therefore expect to have at least the lower or the upper boundary to be
340        inclusive in the API that is queried.
341        """
342        self._merge_partitions()
343
344        if self._start is not None and self._is_start_before_first_slice():
345            yield from self._split_per_slice_range(
346                self._start,
347                self._concurrent_state["slices"][0][self._connector_state_converter.START_KEY],
348                False,
349            )
350
351        if len(self._concurrent_state["slices"]) == 1:
352            yield from self._split_per_slice_range(
353                self._calculate_lower_boundary_of_last_slice(
354                    self._concurrent_state["slices"][0][self._connector_state_converter.END_KEY]
355                ),
356                self._end_provider(),
357                True,
358            )
359        elif len(self._concurrent_state["slices"]) > 1:
360            for i in range(len(self._concurrent_state["slices"]) - 1):
361                if self._cursor_granularity:
362                    yield from self._split_per_slice_range(
363                        self._concurrent_state["slices"][i][self._connector_state_converter.END_KEY]
364                        + self._cursor_granularity,
365                        self._concurrent_state["slices"][i + 1][
366                            self._connector_state_converter.START_KEY
367                        ],
368                        False,
369                    )
370                else:
371                    yield from self._split_per_slice_range(
372                        self._concurrent_state["slices"][i][
373                            self._connector_state_converter.END_KEY
374                        ],
375                        self._concurrent_state["slices"][i + 1][
376                            self._connector_state_converter.START_KEY
377                        ],
378                        False,
379                    )
380            yield from self._split_per_slice_range(
381                self._calculate_lower_boundary_of_last_slice(
382                    self._concurrent_state["slices"][-1][self._connector_state_converter.END_KEY]
383                ),
384                self._end_provider(),
385                True,
386            )
387        else:
388            raise ValueError("Expected at least one slice")

Generating slices based on a few parameters:

lookback_window: Buffer to remove from END_KEY of the highest slice
slice_range: Max difference between two slices. If the difference between two slices is greater, multiple slices will be created
start: _split_per_slice_range will clip any value to `self._start which means that:
- if upper is less than self._start, no slices will be generated
- if lower is less than self._start, self._start will be used as the lower boundary (lookback_window will not be considered in that case)

Note that the slices will overlap at their boundaries. We therefore expect to have at least the lower or the upper boundary to be inclusive in the API that is queried.

def should_be_synced(self, record: Record) -> bool: View Source

484    def should_be_synced(self, record: Record) -> bool:
485        """
486        Determines if a record should be synced based on its cursor value.
487        :param record: The record to evaluate
488
489        :return: True if the record's cursor value falls within the sync boundaries
490        """
491        try:
492            record_cursor_value: CursorValueType = self._extract_cursor_value(record)
493        except ValueError:
494            self._log_for_record_without_cursor_value()
495            return True
496        return self.start <= record_cursor_value <= self._end_provider()

Determines if a record should be synced based on its cursor value.

Parameters

record: The record to evaluate

Returns

True if the record's cursor value falls within the sync boundaries

class ConcurrentSource: View Source

 30class ConcurrentSource:
 31    """
 32    A Source that reads data from multiple AbstractStreams concurrently.
 33    It does so by submitting partition generation, and partition read tasks to a thread pool.
 34    The tasks asynchronously add their output to a shared queue.
 35    The read is done when all partitions for all streams w ere generated and read.
 36    """
 37
 38    DEFAULT_TIMEOUT_SECONDS = 900
 39
 40    @staticmethod
 41    def create(
 42        num_workers: int,
 43        initial_number_of_partitions_to_generate: int,
 44        logger: logging.Logger,
 45        slice_logger: SliceLogger,
 46        message_repository: MessageRepository,
 47        timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS,
 48    ) -> "ConcurrentSource":
 49        is_single_threaded = initial_number_of_partitions_to_generate == 1 and num_workers == 1
 50        too_many_generator = (
 51            not is_single_threaded and initial_number_of_partitions_to_generate >= num_workers
 52        )
 53        assert not too_many_generator, (
 54            "It is required to have more workers than threads generating partitions"
 55        )
 56        threadpool = ThreadPoolManager(
 57            concurrent.futures.ThreadPoolExecutor(
 58                max_workers=num_workers, thread_name_prefix="workerpool"
 59            ),
 60            logger,
 61        )
 62        return ConcurrentSource(
 63            threadpool,
 64            logger,
 65            slice_logger,
 66            message_repository,
 67            initial_number_of_partitions_to_generate,
 68            timeout_seconds,
 69        )
 70
 71    def __init__(
 72        self,
 73        threadpool: ThreadPoolManager,
 74        logger: logging.Logger,
 75        slice_logger: SliceLogger = DebugSliceLogger(),
 76        message_repository: MessageRepository = InMemoryMessageRepository(),
 77        initial_number_partitions_to_generate: int = 1,
 78        timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS,
 79    ) -> None:
 80        """
 81        :param threadpool: The threadpool to submit tasks to
 82        :param logger: The logger to log to
 83        :param slice_logger: The slice logger used to create messages on new slices
 84        :param message_repository: The repository to emit messages to
 85        :param initial_number_partitions_to_generate: The initial number of concurrent partition generation tasks. Limiting this number ensures will limit the latency of the first records emitted. While the latency is not critical, emitting the records early allows the platform and the destination to process them as early as possible.
 86        :param timeout_seconds: The maximum number of seconds to wait for a record to be read from the queue. If no record is read within this time, the source will stop reading and return.
 87        """
 88        self._threadpool = threadpool
 89        self._logger = logger
 90        self._slice_logger = slice_logger
 91        self._message_repository = message_repository
 92        self._initial_number_partitions_to_generate = initial_number_partitions_to_generate
 93        self._timeout_seconds = timeout_seconds
 94
 95    def read(
 96        self,
 97        streams: List[AbstractStream],
 98    ) -> Iterator[AirbyteMessage]:
 99        self._logger.info("Starting syncing")
100
101        # We set a maxsize to for the main thread to process record items when the queue size grows. This assumes that there are less
102        # threads generating partitions that than are max number of workers. If it weren't the case, we could have threads only generating
103        # partitions which would fill the queue. This number is arbitrarily set to 10_000 but will probably need to be changed given more
104        # information and might even need to be configurable depending on the source
105        queue: Queue[QueueItem] = Queue(maxsize=10_000)
106        concurrent_stream_processor = ConcurrentReadProcessor(
107            streams,
108            PartitionEnqueuer(queue, self._threadpool),
109            self._threadpool,
110            self._logger,
111            self._slice_logger,
112            self._message_repository,
113            PartitionReader(queue),
114        )
115
116        # Enqueue initial partition generation tasks
117        yield from self._submit_initial_partition_generators(concurrent_stream_processor)
118
119        # Read from the queue until all partitions were generated and read
120        yield from self._consume_from_queue(
121            queue,
122            concurrent_stream_processor,
123        )
124        self._threadpool.check_for_errors_and_shutdown()
125        self._logger.info("Finished syncing")
126
127    def _submit_initial_partition_generators(
128        self, concurrent_stream_processor: ConcurrentReadProcessor
129    ) -> Iterable[AirbyteMessage]:
130        for _ in range(self._initial_number_partitions_to_generate):
131            status_message = concurrent_stream_processor.start_next_partition_generator()
132            if status_message:
133                yield status_message
134
135    def _consume_from_queue(
136        self,
137        queue: Queue[QueueItem],
138        concurrent_stream_processor: ConcurrentReadProcessor,
139    ) -> Iterable[AirbyteMessage]:
140        while airbyte_message_or_record_or_exception := queue.get():
141            yield from self._handle_item(
142                airbyte_message_or_record_or_exception,
143                concurrent_stream_processor,
144            )
145            if concurrent_stream_processor.is_done() and queue.empty():
146                # all partitions were generated and processed. we're done here
147                break
148
149    def _handle_item(
150        self,
151        queue_item: QueueItem,
152        concurrent_stream_processor: ConcurrentReadProcessor,
153    ) -> Iterable[AirbyteMessage]:
154        # handle queue item and call the appropriate handler depending on the type of the queue item
155        if isinstance(queue_item, StreamThreadException):
156            yield from concurrent_stream_processor.on_exception(queue_item)
157        elif isinstance(queue_item, PartitionGenerationCompletedSentinel):
158            yield from concurrent_stream_processor.on_partition_generation_completed(queue_item)
159        elif isinstance(queue_item, Partition):
160            concurrent_stream_processor.on_partition(queue_item)
161        elif isinstance(queue_item, PartitionCompleteSentinel):
162            yield from concurrent_stream_processor.on_partition_complete_sentinel(queue_item)
163        elif isinstance(queue_item, Record):
164            yield from concurrent_stream_processor.on_record(queue_item)
165        else:
166            raise ValueError(f"Unknown queue item type: {type(queue_item)}")

A Source that reads data from multiple AbstractStreams concurrently. It does so by submitting partition generation, and partition read tasks to a thread pool. The tasks asynchronously add their output to a shared queue. The read is done when all partitions for all streams w ere generated and read.

ConcurrentSource( threadpool: airbyte_cdk.sources.concurrent_source.thread_pool_manager.ThreadPoolManager, logger: logging.Logger, slice_logger: airbyte_cdk.sources.utils.slice_logger.SliceLogger = <airbyte_cdk.sources.utils.slice_logger.DebugSliceLogger object>, message_repository: MessageRepository = <InMemoryMessageRepository object>, initial_number_partitions_to_generate: int = 1, timeout_seconds: int = 900) View Source

71    def __init__(
72        self,
73        threadpool: ThreadPoolManager,
74        logger: logging.Logger,
75        slice_logger: SliceLogger = DebugSliceLogger(),
76        message_repository: MessageRepository = InMemoryMessageRepository(),
77        initial_number_partitions_to_generate: int = 1,
78        timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS,
79    ) -> None:
80        """
81        :param threadpool: The threadpool to submit tasks to
82        :param logger: The logger to log to
83        :param slice_logger: The slice logger used to create messages on new slices
84        :param message_repository: The repository to emit messages to
85        :param initial_number_partitions_to_generate: The initial number of concurrent partition generation tasks. Limiting this number ensures will limit the latency of the first records emitted. While the latency is not critical, emitting the records early allows the platform and the destination to process them as early as possible.
86        :param timeout_seconds: The maximum number of seconds to wait for a record to be read from the queue. If no record is read within this time, the source will stop reading and return.
87        """
88        self._threadpool = threadpool
89        self._logger = logger
90        self._slice_logger = slice_logger
91        self._message_repository = message_repository
92        self._initial_number_partitions_to_generate = initial_number_partitions_to_generate
93        self._timeout_seconds = timeout_seconds

Parameters

threadpool: The threadpool to submit tasks to
logger: The logger to log to
slice_logger: The slice logger used to create messages on new slices
message_repository: The repository to emit messages to
initial_number_partitions_to_generate: The initial number of concurrent partition generation tasks. Limiting this number ensures will limit the latency of the first records emitted. While the latency is not critical, emitting the records early allows the platform and the destination to process them as early as possible.
timeout_seconds: The maximum number of seconds to wait for a record to be read from the queue. If no record is read within this time, the source will stop reading and return.

DEFAULT_TIMEOUT_SECONDS = 900

@staticmethod

def create( num_workers: int, initial_number_of_partitions_to_generate: int, logger: logging.Logger, slice_logger: airbyte_cdk.sources.utils.slice_logger.SliceLogger, message_repository: MessageRepository, timeout_seconds: int = 900) -> ConcurrentSource: View Source

40    @staticmethod
41    def create(
42        num_workers: int,
43        initial_number_of_partitions_to_generate: int,
44        logger: logging.Logger,
45        slice_logger: SliceLogger,
46        message_repository: MessageRepository,
47        timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS,
48    ) -> "ConcurrentSource":
49        is_single_threaded = initial_number_of_partitions_to_generate == 1 and num_workers == 1
50        too_many_generator = (
51            not is_single_threaded and initial_number_of_partitions_to_generate >= num_workers
52        )
53        assert not too_many_generator, (
54            "It is required to have more workers than threads generating partitions"
55        )
56        threadpool = ThreadPoolManager(
57            concurrent.futures.ThreadPoolExecutor(
58                max_workers=num_workers, thread_name_prefix="workerpool"
59            ),
60            logger,
61        )
62        return ConcurrentSource(
63            threadpool,
64            logger,
65            slice_logger,
66            message_repository,
67            initial_number_of_partitions_to_generate,
68            timeout_seconds,
69        )

def read( self, streams: List[airbyte_cdk.sources.streams.concurrent.abstract_stream.AbstractStream]) -> Iterator[AirbyteMessage]: View Source

 95    def read(
 96        self,
 97        streams: List[AbstractStream],
 98    ) -> Iterator[AirbyteMessage]:
 99        self._logger.info("Starting syncing")
100
101        # We set a maxsize to for the main thread to process record items when the queue size grows. This assumes that there are less
102        # threads generating partitions that than are max number of workers. If it weren't the case, we could have threads only generating
103        # partitions which would fill the queue. This number is arbitrarily set to 10_000 but will probably need to be changed given more
104        # information and might even need to be configurable depending on the source
105        queue: Queue[QueueItem] = Queue(maxsize=10_000)
106        concurrent_stream_processor = ConcurrentReadProcessor(
107            streams,
108            PartitionEnqueuer(queue, self._threadpool),
109            self._threadpool,
110            self._logger,
111            self._slice_logger,
112            self._message_repository,
113            PartitionReader(queue),
114        )
115
116        # Enqueue initial partition generation tasks
117        yield from self._submit_initial_partition_generators(concurrent_stream_processor)
118
119        # Read from the queue until all partitions were generated and read
120        yield from self._consume_from_queue(
121            queue,
122            concurrent_stream_processor,
123        )
124        self._threadpool.check_for_errors_and_shutdown()
125        self._logger.info("Finished syncing")

class ConcurrentSourceAdapter(airbyte_cdk.connector.DefaultConnectorMixin, airbyte_cdk.sources.source.BaseSource[typing.Mapping[str, typing.Any], typing.List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage], airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog], abc.ABC): View Source

 34class ConcurrentSourceAdapter(AbstractSource, ABC):
 35    def __init__(self, concurrent_source: ConcurrentSource, **kwargs: Any) -> None:
 36        """
 37        ConcurrentSourceAdapter is a Source that wraps a concurrent source and exposes it as a regular source.
 38
 39        The source's streams are still defined through the streams() method.
 40        Streams wrapped in a StreamFacade will be processed concurrently.
 41        Other streams will be processed sequentially as a later step.
 42        """
 43        self._concurrent_source = concurrent_source
 44        super().__init__(**kwargs)
 45
 46    def read(
 47        self,
 48        logger: logging.Logger,
 49        config: Mapping[str, Any],
 50        catalog: ConfiguredAirbyteCatalog,
 51        state: Optional[List[AirbyteStateMessage]] = None,
 52    ) -> Iterator[AirbyteMessage]:
 53        abstract_streams = self._select_abstract_streams(config, catalog)
 54        concurrent_stream_names = {stream.name for stream in abstract_streams}
 55        configured_catalog_for_regular_streams = ConfiguredAirbyteCatalog(
 56            streams=[
 57                stream
 58                for stream in catalog.streams
 59                if stream.stream.name not in concurrent_stream_names
 60            ]
 61        )
 62        if abstract_streams:
 63            yield from self._concurrent_source.read(abstract_streams)
 64        if configured_catalog_for_regular_streams.streams:
 65            yield from super().read(logger, config, configured_catalog_for_regular_streams, state)
 66
 67    def _select_abstract_streams(
 68        self, config: Mapping[str, Any], configured_catalog: ConfiguredAirbyteCatalog
 69    ) -> List[AbstractStream]:
 70        """
 71        Selects streams that can be processed concurrently and returns their abstract representations.
 72        """
 73        all_streams = self.streams(config)
 74        stream_name_to_instance: Mapping[str, Stream] = {s.name: s for s in all_streams}
 75        abstract_streams: List[AbstractStream] = []
 76        for configured_stream in configured_catalog.streams:
 77            stream_instance = stream_name_to_instance.get(configured_stream.stream.name)
 78            if not stream_instance:
 79                continue
 80
 81            if isinstance(stream_instance, AbstractStreamFacade):
 82                abstract_streams.append(stream_instance.get_underlying_stream())
 83        return abstract_streams
 84
 85    def convert_to_concurrent_stream(
 86        self,
 87        logger: logging.Logger,
 88        stream: Stream,
 89        state_manager: ConnectorStateManager,
 90        cursor: Optional[Cursor] = None,
 91    ) -> Stream:
 92        """
 93        Prepares a stream for concurrent processing by initializing or assigning a cursor,
 94        managing the stream's state, and returning an updated Stream instance.
 95        """
 96        state: MutableMapping[str, Any] = {}
 97
 98        if cursor:
 99            state = state_manager.get_stream_state(stream.name, stream.namespace)
100
101            stream.cursor = cursor  # type: ignore[assignment]  # cursor is of type ConcurrentCursor, which inherits from Cursor
102            if hasattr(stream, "parent"):
103                stream.parent.cursor = cursor
104        else:
105            cursor = FinalStateCursor(
106                stream_name=stream.name,
107                stream_namespace=stream.namespace,
108                message_repository=self.message_repository,  # type: ignore[arg-type]  # _default_message_repository will be returned in the worst case
109            )
110        return StreamFacade.create_from_stream(stream, self, logger, state, cursor)
111
112    def initialize_cursor(
113        self,
114        stream: Stream,
115        state_manager: ConnectorStateManager,
116        converter: AbstractStreamStateConverter,
117        slice_boundary_fields: Optional[Tuple[str, str]],
118        start: Optional[CursorValueType],
119        end_provider: Callable[[], CursorValueType],
120        lookback_window: Optional[GapType] = None,
121        slice_range: Optional[GapType] = None,
122    ) -> Optional[ConcurrentCursor]:
123        lookback_window = lookback_window or timedelta(seconds=DEFAULT_LOOKBACK_SECONDS)
124
125        cursor_field_name = stream.cursor_field
126
127        if cursor_field_name:
128            if not isinstance(cursor_field_name, str):
129                raise ValueError(
130                    f"Cursor field type must be a string, but received {type(cursor_field_name).__name__}."
131                )
132
133            return ConcurrentCursor(
134                stream.name,
135                stream.namespace,
136                state_manager.get_stream_state(stream.name, stream.namespace),
137                self.message_repository,  # type: ignore[arg-type]  # _default_message_repository will be returned in the worst case
138                state_manager,
139                converter,
140                CursorField(cursor_field_name),
141                slice_boundary_fields,
142                start,
143                end_provider,
144                lookback_window,
145                slice_range,
146            )
147
148        return None

Abstract base class for an Airbyte Source. Consumers should implement any abstract methods in this class to create an Airbyte Specification compliant Source.

ConcurrentSourceAdapter( concurrent_source: ConcurrentSource, **kwargs: Any) View Source

35    def __init__(self, concurrent_source: ConcurrentSource, **kwargs: Any) -> None:
36        """
37        ConcurrentSourceAdapter is a Source that wraps a concurrent source and exposes it as a regular source.
38
39        The source's streams are still defined through the streams() method.
40        Streams wrapped in a StreamFacade will be processed concurrently.
41        Other streams will be processed sequentially as a later step.
42        """
43        self._concurrent_source = concurrent_source
44        super().__init__(**kwargs)

ConcurrentSourceAdapter is a Source that wraps a concurrent source and exposes it as a regular source.

The source's streams are still defined through the streams() method. Streams wrapped in a StreamFacade will be processed concurrently. Other streams will be processed sequentially as a later step.

def read( self, logger: logging.Logger, config: Mapping[str, Any], catalog: airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog, state: Optional[List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage]] = None) -> Iterator[AirbyteMessage]: View Source

46    def read(
47        self,
48        logger: logging.Logger,
49        config: Mapping[str, Any],
50        catalog: ConfiguredAirbyteCatalog,
51        state: Optional[List[AirbyteStateMessage]] = None,
52    ) -> Iterator[AirbyteMessage]:
53        abstract_streams = self._select_abstract_streams(config, catalog)
54        concurrent_stream_names = {stream.name for stream in abstract_streams}
55        configured_catalog_for_regular_streams = ConfiguredAirbyteCatalog(
56            streams=[
57                stream
58                for stream in catalog.streams
59                if stream.stream.name not in concurrent_stream_names
60            ]
61        )
62        if abstract_streams:
63            yield from self._concurrent_source.read(abstract_streams)
64        if configured_catalog_for_regular_streams.streams:
65            yield from super().read(logger, config, configured_catalog_for_regular_streams, state)

Implements the Read operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/.

def convert_to_concurrent_stream( self, logger: logging.Logger, stream: Stream, state_manager: ConnectorStateManager, cursor: Optional[Cursor] = None) -> Stream: View Source

 85    def convert_to_concurrent_stream(
 86        self,
 87        logger: logging.Logger,
 88        stream: Stream,
 89        state_manager: ConnectorStateManager,
 90        cursor: Optional[Cursor] = None,
 91    ) -> Stream:
 92        """
 93        Prepares a stream for concurrent processing by initializing or assigning a cursor,
 94        managing the stream's state, and returning an updated Stream instance.
 95        """
 96        state: MutableMapping[str, Any] = {}
 97
 98        if cursor:
 99            state = state_manager.get_stream_state(stream.name, stream.namespace)
100
101            stream.cursor = cursor  # type: ignore[assignment]  # cursor is of type ConcurrentCursor, which inherits from Cursor
102            if hasattr(stream, "parent"):
103                stream.parent.cursor = cursor
104        else:
105            cursor = FinalStateCursor(
106                stream_name=stream.name,
107                stream_namespace=stream.namespace,
108                message_repository=self.message_repository,  # type: ignore[arg-type]  # _default_message_repository will be returned in the worst case
109            )
110        return StreamFacade.create_from_stream(stream, self, logger, state, cursor)

Prepares a stream for concurrent processing by initializing or assigning a cursor, managing the stream's state, and returning an updated Stream instance.

def initialize_cursor( self, stream: Stream, state_manager: ConnectorStateManager, converter: airbyte_cdk.sources.streams.concurrent.state_converters.abstract_stream_state_converter.AbstractStreamStateConverter, slice_boundary_fields: Optional[Tuple[str, str]], start: Optional[airbyte_cdk.sources.streams.concurrent.cursor_types.CursorValueType], end_provider: Callable[[], airbyte_cdk.sources.streams.concurrent.cursor_types.CursorValueType], lookback_window: Optional[airbyte_cdk.sources.streams.concurrent.cursor_types.GapType] = None, slice_range: Optional[airbyte_cdk.sources.streams.concurrent.cursor_types.GapType] = None) -> Optional[ConcurrentCursor]: View Source

112    def initialize_cursor(
113        self,
114        stream: Stream,
115        state_manager: ConnectorStateManager,
116        converter: AbstractStreamStateConverter,
117        slice_boundary_fields: Optional[Tuple[str, str]],
118        start: Optional[CursorValueType],
119        end_provider: Callable[[], CursorValueType],
120        lookback_window: Optional[GapType] = None,
121        slice_range: Optional[GapType] = None,
122    ) -> Optional[ConcurrentCursor]:
123        lookback_window = lookback_window or timedelta(seconds=DEFAULT_LOOKBACK_SECONDS)
124
125        cursor_field_name = stream.cursor_field
126
127        if cursor_field_name:
128            if not isinstance(cursor_field_name, str):
129                raise ValueError(
130                    f"Cursor field type must be a string, but received {type(cursor_field_name).__name__}."
131                )
132
133            return ConcurrentCursor(
134                stream.name,
135                stream.namespace,
136                state_manager.get_stream_state(stream.name, stream.namespace),
137                self.message_repository,  # type: ignore[arg-type]  # _default_message_repository will be returned in the worst case
138                state_manager,
139                converter,
140                CursorField(cursor_field_name),
141                slice_boundary_fields,
142                start,
143                end_provider,
144                lookback_window,
145                slice_range,
146            )
147
148        return None

Inherited Members

airbyte_cdk.connector.DefaultConnectorMixin: configure
AbstractSource: check_connection; streams; discover; check; raise_exception_on_missing_stream; message_repository; stop_sync_on_stream_failure
Source: read_state; read_catalog; name
BaseConnector: check_config_against_spec; read_config; write_config; spec

class Cursor(airbyte_cdk.sources.streams.concurrent.partitions.stream_slicer.StreamSlicer, abc.ABC): View Source

51class Cursor(StreamSlicer, ABC):
52    @property
53    @abstractmethod
54    def state(self) -> MutableMapping[str, Any]: ...
55
56    @abstractmethod
57    def observe(self, record: Record) -> None:
58        """
59        Indicate to the cursor that the record has been emitted
60        """
61        raise NotImplementedError()
62
63    @abstractmethod
64    def close_partition(self, partition: Partition) -> None:
65        """
66        Indicate to the cursor that the partition has been successfully processed
67        """
68        raise NotImplementedError()
69
70    @abstractmethod
71    def ensure_at_least_one_state_emitted(self) -> None:
72        """
73        State messages are emitted when a partition is closed. However, the platform expects at least one state to be emitted per sync per
74        stream. Hence, if no partitions are generated, this method needs to be called.
75        """
76        raise NotImplementedError()
77
78    @abstractmethod
79    def should_be_synced(self, record: Record) -> bool:
80        pass
81
82    def stream_slices(self) -> Iterable[StreamSlice]:
83        """
84        Default placeholder implementation of generate_slices.
85        Subclasses can override this method to provide actual behavior.
86        """
87        yield StreamSlice(partition={}, cursor_slice={})

Slices the stream into chunks that can be fetched independently. Slices enable state checkpointing and data retrieval parallelization.

state: MutableMapping[str, Any] View Source

52    @property
53    @abstractmethod
54    def state(self) -> MutableMapping[str, Any]: ...

@abstractmethod

def observe(self, record: Record) -> None: View Source

56    @abstractmethod
57    def observe(self, record: Record) -> None:
58        """
59        Indicate to the cursor that the record has been emitted
60        """
61        raise NotImplementedError()

Indicate to the cursor that the record has been emitted

@abstractmethod

def close_partition( self, partition: airbyte_cdk.sources.streams.concurrent.partitions.partition.Partition) -> None: View Source

63    @abstractmethod
64    def close_partition(self, partition: Partition) -> None:
65        """
66        Indicate to the cursor that the partition has been successfully processed
67        """
68        raise NotImplementedError()

Indicate to the cursor that the partition has been successfully processed

@abstractmethod

def ensure_at_least_one_state_emitted(self) -> None: View Source

70    @abstractmethod
71    def ensure_at_least_one_state_emitted(self) -> None:
72        """
73        State messages are emitted when a partition is closed. However, the platform expects at least one state to be emitted per sync per
74        stream. Hence, if no partitions are generated, this method needs to be called.
75        """
76        raise NotImplementedError()

State messages are emitted when a partition is closed. However, the platform expects at least one state to be emitted per sync per stream. Hence, if no partitions are generated, this method needs to be called.

@abstractmethod

def should_be_synced(self, record: Record) -> bool: View Source

78    @abstractmethod
79    def should_be_synced(self, record: Record) -> bool:
80        pass

def stream_slices(self) -> Iterable[StreamSlice]: View Source

82    def stream_slices(self) -> Iterable[StreamSlice]:
83        """
84        Default placeholder implementation of generate_slices.
85        Subclasses can override this method to provide actual behavior.
86        """
87        yield StreamSlice(partition={}, cursor_slice={})

Default placeholder implementation of generate_slices. Subclasses can override this method to provide actual behavior.

class CursorField: View Source

40class CursorField:
41    def __init__(self, cursor_field_key: str) -> None:
42        self.cursor_field_key = cursor_field_key
43
44    def extract_value(self, record: Record) -> CursorValueType:
45        cursor_value = record.data.get(self.cursor_field_key)
46        if cursor_value is None:
47            raise ValueError(f"Could not find cursor field {self.cursor_field_key} in record")
48        return cursor_value  # type: ignore  # we assume that the value the path points at is a comparable

CursorField(cursor_field_key: str) View Source

41    def __init__(self, cursor_field_key: str) -> None:
42        self.cursor_field_key = cursor_field_key

cursor_field_key

def extract_value( self, record: Record) -> airbyte_cdk.sources.streams.concurrent.cursor_types.CursorValueType: View Source

44    def extract_value(self, record: Record) -> CursorValueType:
45        cursor_value = record.data.get(self.cursor_field_key)
46        if cursor_value is None:
47            raise ValueError(f"Could not find cursor field {self.cursor_field_key} in record")
48        return cursor_value  # type: ignore  # we assume that the value the path points at is a comparable

DEFAULT_CONCURRENCY

class EpochValueConcurrentStreamStateConverter(airbyte_cdk.sources.streams.concurrent.state_converters.datetime_stream_state_converter.DateTimeStreamStateConverter): View Source

115class EpochValueConcurrentStreamStateConverter(DateTimeStreamStateConverter):
116    """
117    e.g.
118    { "created": 1617030403 }
119    =>
120    {
121        "state_type": "date-range",
122        "metadata": { … },
123        "slices": [
124            {starts: 0, end: 1617030403, finished_processing: true}
125        ]
126    }
127    """
128
129    _zero_value = 0
130
131    def increment(self, timestamp: datetime) -> datetime:
132        return timestamp + timedelta(seconds=1)
133
134    def output_format(self, timestamp: datetime) -> int:
135        return int(timestamp.timestamp())
136
137    def parse_timestamp(self, timestamp: int) -> datetime:
138        dt_object = AirbyteDateTime.fromtimestamp(timestamp, timezone.utc)
139        if not isinstance(dt_object, AirbyteDateTime):
140            raise ValueError(
141                f"AirbyteDateTime object was expected but got {type(dt_object)} from AirbyteDateTime.fromtimestamp({timestamp})"
142            )
143        return dt_object

e.g. { "created": 1617030403 } => { "state_type": "date-range", "metadata": { … }, "slices": [ {starts: 0, end: 1617030403, finished_processing: true} ] }

def increment(self, timestamp: datetime.datetime) -> datetime.datetime: View Source

131    def increment(self, timestamp: datetime) -> datetime:
132        return timestamp + timedelta(seconds=1)

Increment a timestamp by a single unit.

def output_format(self, timestamp: datetime.datetime) -> int: View Source

134    def output_format(self, timestamp: datetime) -> int:
135        return int(timestamp.timestamp())

Convert the cursor value type to a JSON valid type.

def parse_timestamp(self, timestamp: int) -> datetime.datetime: View Source

137    def parse_timestamp(self, timestamp: int) -> datetime:
138        dt_object = AirbyteDateTime.fromtimestamp(timestamp, timezone.utc)
139        if not isinstance(dt_object, AirbyteDateTime):
140            raise ValueError(
141                f"AirbyteDateTime object was expected but got {type(dt_object)} from AirbyteDateTime.fromtimestamp({timestamp})"
142            )
143        return dt_object

Inherited Members

airbyte_cdk.sources.streams.concurrent.state_converters.abstract_stream_state_converter.AbstractStreamStateConverter: AbstractStreamStateConverter; START_KEY; END_KEY; MOST_RECENT_RECORD_KEY; convert_to_state_message; deserialize; serialize; is_state_message_compatible; merge_intervals
airbyte_cdk.sources.streams.concurrent.state_converters.datetime_stream_state_converter.DateTimeStreamStateConverter: zero_value; get_end_provider; parse_value; convert_from_sequential_state

class FinalStateCursor(airbyte_cdk.Cursor): View Source

 90class FinalStateCursor(Cursor):
 91    """Cursor that is used to guarantee at least one state message is emitted for a concurrent stream."""
 92
 93    def __init__(
 94        self,
 95        stream_name: str,
 96        stream_namespace: Optional[str],
 97        message_repository: MessageRepository,
 98    ) -> None:
 99        self._stream_name = stream_name
100        self._stream_namespace = stream_namespace
101        self._message_repository = message_repository
102        # Normally the connector state manager operates at the source-level. However, we only need it to write the sentinel
103        # state message rather than manage overall source state. This is also only temporary as we move to the resumable
104        # full refresh world where every stream uses a FileBasedConcurrentCursor with incremental state.
105        self._connector_state_manager = ConnectorStateManager()
106        self._has_closed_at_least_one_slice = False
107
108    @property
109    def state(self) -> MutableMapping[str, Any]:
110        return {NO_CURSOR_STATE_KEY: True}
111
112    def observe(self, record: Record) -> None:
113        pass
114
115    def close_partition(self, partition: Partition) -> None:
116        pass
117
118    def ensure_at_least_one_state_emitted(self) -> None:
119        """
120        Used primarily for full refresh syncs that do not have a valid cursor value to emit at the end of a sync
121        """
122
123        self._connector_state_manager.update_state_for_stream(
124            self._stream_name, self._stream_namespace, self.state
125        )
126        state_message = self._connector_state_manager.create_state_message(
127            self._stream_name, self._stream_namespace
128        )
129        self._message_repository.emit_message(state_message)
130
131    def should_be_synced(self, record: Record) -> bool:
132        return True

Cursor that is used to guarantee at least one state message is emitted for a concurrent stream.

FinalStateCursor( stream_name: str, stream_namespace: Optional[str], message_repository: MessageRepository) View Source

 93    def __init__(
 94        self,
 95        stream_name: str,
 96        stream_namespace: Optional[str],
 97        message_repository: MessageRepository,
 98    ) -> None:
 99        self._stream_name = stream_name
100        self._stream_namespace = stream_namespace
101        self._message_repository = message_repository
102        # Normally the connector state manager operates at the source-level. However, we only need it to write the sentinel
103        # state message rather than manage overall source state. This is also only temporary as we move to the resumable
104        # full refresh world where every stream uses a FileBasedConcurrentCursor with incremental state.
105        self._connector_state_manager = ConnectorStateManager()
106        self._has_closed_at_least_one_slice = False

state: MutableMapping[str, Any] View Source

108    @property
109    def state(self) -> MutableMapping[str, Any]:
110        return {NO_CURSOR_STATE_KEY: True}

def observe(self, record: Record) -> None: View Source

112    def observe(self, record: Record) -> None:
113        pass

Indicate to the cursor that the record has been emitted

def close_partition( self, partition: airbyte_cdk.sources.streams.concurrent.partitions.partition.Partition) -> None: View Source

115    def close_partition(self, partition: Partition) -> None:
116        pass

Indicate to the cursor that the partition has been successfully processed

def ensure_at_least_one_state_emitted(self) -> None: View Source

118    def ensure_at_least_one_state_emitted(self) -> None:
119        """
120        Used primarily for full refresh syncs that do not have a valid cursor value to emit at the end of a sync
121        """
122
123        self._connector_state_manager.update_state_for_stream(
124            self._stream_name, self._stream_namespace, self.state
125        )
126        state_message = self._connector_state_manager.create_state_message(
127            self._stream_name, self._stream_namespace
128        )
129        self._message_repository.emit_message(state_message)

Used primarily for full refresh syncs that do not have a valid cursor value to emit at the end of a sync

def should_be_synced(self, record: Record) -> bool: View Source

131    def should_be_synced(self, record: Record) -> bool:
132        return True

Inherited Members

Cursor: stream_slices

class IsoMillisConcurrentStreamStateConverter(airbyte_cdk.sources.streams.concurrent.state_converters.datetime_stream_state_converter.DateTimeStreamStateConverter): View Source

146class IsoMillisConcurrentStreamStateConverter(DateTimeStreamStateConverter):
147    """
148    e.g.
149    { "created": "2021-01-18T21:18:20.000Z" }
150    =>
151    {
152        "state_type": "date-range",
153        "metadata": { … },
154        "slices": [
155            {starts: "2020-01-18T21:18:20.000Z", end: "2021-01-18T21:18:20.000Z", finished_processing: true}
156        ]
157    }
158    """
159
160    _zero_value = "0001-01-01T00:00:00.000Z"
161
162    def __init__(
163        self, is_sequential_state: bool = True, cursor_granularity: Optional[timedelta] = None
164    ):
165        super().__init__(is_sequential_state=is_sequential_state)
166        self._cursor_granularity = cursor_granularity or timedelta(milliseconds=1)
167
168    def increment(self, timestamp: datetime) -> datetime:
169        return timestamp + self._cursor_granularity
170
171    def output_format(self, timestamp: datetime) -> str:
172        """Format datetime with milliseconds always included.
173
174        Args:
175            timestamp: The datetime to format.
176
177        Returns:
178            str: ISO8601/RFC3339 formatted string with milliseconds.
179        """
180        dt = AirbyteDateTime.from_datetime(timestamp)
181        # Always include milliseconds, even if zero
182        millis = dt.microsecond // 1000 if dt.microsecond else 0
183        return f"{dt.year:04d}-{dt.month:02d}-{dt.day:02d}T{dt.hour:02d}:{dt.minute:02d}:{dt.second:02d}.{millis:03d}Z"
184
185    def parse_timestamp(self, timestamp: str) -> datetime:
186        dt_object = ab_datetime_parse(timestamp)
187        if not isinstance(dt_object, AirbyteDateTime):
188            raise ValueError(
189                f"AirbyteDateTime object was expected but got {type(dt_object)} from parse({timestamp})"
190            )
191        return dt_object

e.g. { "created": "2021-01-18T21:18:20.000Z" } => { "state_type": "date-range", "metadata": { … }, "slices": [ {starts: "2020-01-18T21:18:20.000Z", end: "2021-01-18T21:18:20.000Z", finished_processing: true} ] }

IsoMillisConcurrentStreamStateConverter( is_sequential_state: bool = True, cursor_granularity: Optional[datetime.timedelta] = None) View Source

162    def __init__(
163        self, is_sequential_state: bool = True, cursor_granularity: Optional[timedelta] = None
164    ):
165        super().__init__(is_sequential_state=is_sequential_state)
166        self._cursor_granularity = cursor_granularity or timedelta(milliseconds=1)

def increment(self, timestamp: datetime.datetime) -> datetime.datetime: View Source

168    def increment(self, timestamp: datetime) -> datetime:
169        return timestamp + self._cursor_granularity

Increment a timestamp by a single unit.

def output_format(self, timestamp: datetime.datetime) -> str: View Source

171    def output_format(self, timestamp: datetime) -> str:
172        """Format datetime with milliseconds always included.
173
174        Args:
175            timestamp: The datetime to format.
176
177        Returns:
178            str: ISO8601/RFC3339 formatted string with milliseconds.
179        """
180        dt = AirbyteDateTime.from_datetime(timestamp)
181        # Always include milliseconds, even if zero
182        millis = dt.microsecond // 1000 if dt.microsecond else 0
183        return f"{dt.year:04d}-{dt.month:02d}-{dt.day:02d}T{dt.hour:02d}:{dt.minute:02d}:{dt.second:02d}.{millis:03d}Z"

Format datetime with milliseconds always included.

Arguments:

timestamp: The datetime to format.

Returns:

str: ISO8601/RFC3339 formatted string with milliseconds.

def parse_timestamp(self, timestamp: str) -> datetime.datetime: View Source

185    def parse_timestamp(self, timestamp: str) -> datetime:
186        dt_object = ab_datetime_parse(timestamp)
187        if not isinstance(dt_object, AirbyteDateTime):
188            raise ValueError(
189                f"AirbyteDateTime object was expected but got {type(dt_object)} from parse({timestamp})"
190            )
191        return dt_object

Inherited Members

airbyte_cdk.sources.streams.concurrent.state_converters.datetime_stream_state_converter.DateTimeStreamStateConverter: zero_value; get_end_provider; parse_value; convert_from_sequential_state
airbyte_cdk.sources.streams.concurrent.state_converters.abstract_stream_state_converter.AbstractStreamStateConverter: START_KEY; END_KEY; MOST_RECENT_RECORD_KEY; convert_to_state_message; deserialize; serialize; is_state_message_compatible; merge_intervals

@deprecated('This class is experimental. Use at your own risk.', category=ExperimentalClassWarning)

class StreamFacade(airbyte_cdk.sources.streams.concurrent.abstract_stream_facade.AbstractStreamFacade[airbyte_cdk.sources.streams.concurrent.default_stream.DefaultStream], airbyte_cdk.Stream): View Source

 49@deprecated(
 50    "This class is experimental. Use at your own risk.",
 51    category=ExperimentalClassWarning,
 52)
 53class StreamFacade(AbstractStreamFacade[DefaultStream], Stream):
 54    """
 55    The StreamFacade is a Stream that wraps an AbstractStream and exposes it as a Stream.
 56
 57    All methods either delegate to the wrapped AbstractStream or provide a default implementation.
 58    The default implementations define restrictions imposed on Streams migrated to the new interface. For instance, only source-defined cursors are supported.
 59    """
 60
 61    @classmethod
 62    def create_from_stream(
 63        cls,
 64        stream: Stream,
 65        source: AbstractSource,
 66        logger: logging.Logger,
 67        state: Optional[MutableMapping[str, Any]],
 68        cursor: Cursor,
 69    ) -> Stream:
 70        """
 71        Create a ConcurrentStream from a Stream object.
 72        :param source: The source
 73        :param stream: The stream
 74        :param max_workers: The maximum number of worker thread to use
 75        :return:
 76        """
 77        pk = get_primary_key_from_stream(stream.primary_key)
 78        cursor_field = get_cursor_field_from_stream(stream)
 79
 80        if not source.message_repository:
 81            raise ValueError(
 82                "A message repository is required to emit non-record messages. Please set the message repository on the source."
 83            )
 84
 85        message_repository = source.message_repository
 86        return StreamFacade(
 87            DefaultStream(
 88                partition_generator=StreamPartitionGenerator(
 89                    stream,
 90                    message_repository,
 91                    SyncMode.full_refresh
 92                    if isinstance(cursor, FinalStateCursor)
 93                    else SyncMode.incremental,
 94                    [cursor_field] if cursor_field is not None else None,
 95                    state,
 96                ),
 97                name=stream.name,
 98                namespace=stream.namespace,
 99                json_schema=stream.get_json_schema(),
100                primary_key=pk,
101                cursor_field=cursor_field,
102                logger=logger,
103                cursor=cursor,
104            ),
105            stream,
106            cursor,
107            slice_logger=source._slice_logger,
108            logger=logger,
109        )
110
111    @property
112    def state(self) -> MutableMapping[str, Any]:
113        raise NotImplementedError(
114            "This should not be called as part of the Concurrent CDK code. Please report the problem to Airbyte"
115        )
116
117    @state.setter
118    def state(self, value: Mapping[str, Any]) -> None:
119        if "state" in dir(self._legacy_stream):
120            self._legacy_stream.state = value  # type: ignore  # validating `state` is attribute of stream using `if` above
121
122    def __init__(
123        self,
124        stream: DefaultStream,
125        legacy_stream: Stream,
126        cursor: Cursor,
127        slice_logger: SliceLogger,
128        logger: logging.Logger,
129    ):
130        """
131        :param stream: The underlying AbstractStream
132        """
133        self._abstract_stream = stream
134        self._legacy_stream = legacy_stream
135        self._cursor = cursor
136        self._slice_logger = slice_logger
137        self._logger = logger
138
139    def read(
140        self,
141        configured_stream: ConfiguredAirbyteStream,
142        logger: logging.Logger,
143        slice_logger: SliceLogger,
144        stream_state: MutableMapping[str, Any],
145        state_manager: ConnectorStateManager,
146        internal_config: InternalConfig,
147    ) -> Iterable[StreamData]:
148        yield from self._read_records()
149
150    def read_records(
151        self,
152        sync_mode: SyncMode,
153        cursor_field: Optional[List[str]] = None,
154        stream_slice: Optional[Mapping[str, Any]] = None,
155        stream_state: Optional[Mapping[str, Any]] = None,
156    ) -> Iterable[StreamData]:
157        try:
158            yield from self._read_records()
159        except Exception as exc:
160            if hasattr(self._cursor, "state"):
161                state = str(self._cursor.state)
162            else:
163                # This shouldn't happen if the ConcurrentCursor was used
164                state = "unknown; no state attribute was available on the cursor"
165            yield AirbyteMessage(
166                type=Type.LOG,
167                log=AirbyteLogMessage(
168                    level=Level.ERROR, message=f"Cursor State at time of exception: {state}"
169                ),
170            )
171            raise exc
172
173    def _read_records(self) -> Iterable[StreamData]:
174        for partition in self._abstract_stream.generate_partitions():
175            if self._slice_logger.should_log_slice_message(self._logger):
176                yield self._slice_logger.create_slice_log_message(partition.to_slice())
177            for record in partition.read():
178                yield record.data
179
180    @property
181    def name(self) -> str:
182        return self._abstract_stream.name
183
184    @property
185    def primary_key(self) -> Optional[Union[str, List[str], List[List[str]]]]:
186        # This method is not expected to be called directly. It is only implemented for backward compatibility with the old interface
187        return self.as_airbyte_stream().source_defined_primary_key  # type: ignore # source_defined_primary_key is known to be an Optional[List[List[str]]]
188
189    @property
190    def cursor_field(self) -> Union[str, List[str]]:
191        if self._abstract_stream.cursor_field is None:
192            return []
193        else:
194            return self._abstract_stream.cursor_field
195
196    @property
197    def cursor(self) -> Optional[Cursor]:  # type: ignore[override] # StreamFaced expects to use only airbyte_cdk.sources.streams.concurrent.cursor.Cursor
198        return self._cursor
199
200    @lru_cache(maxsize=None)
201    def get_json_schema(self) -> Mapping[str, Any]:
202        return self._abstract_stream.get_json_schema()
203
204    @property
205    def supports_incremental(self) -> bool:
206        return self._legacy_stream.supports_incremental
207
208    def as_airbyte_stream(self) -> AirbyteStream:
209        return self._abstract_stream.as_airbyte_stream()
210
211    def log_stream_sync_configuration(self) -> None:
212        self._abstract_stream.log_stream_sync_configuration()
213
214    def get_underlying_stream(self) -> DefaultStream:
215        return self._abstract_stream

The StreamFacade is a Stream that wraps an AbstractStream and exposes it as a Stream.

All methods either delegate to the wrapped AbstractStream or provide a default implementation. The default implementations define restrictions imposed on Streams migrated to the new interface. For instance, only source-defined cursors are supported.

StreamFacade( stream: airbyte_cdk.sources.streams.concurrent.default_stream.DefaultStream, legacy_stream: Stream, cursor: Cursor, slice_logger: airbyte_cdk.sources.utils.slice_logger.SliceLogger, logger: logging.Logger) View Source

122    def __init__(
123        self,
124        stream: DefaultStream,
125        legacy_stream: Stream,
126        cursor: Cursor,
127        slice_logger: SliceLogger,
128        logger: logging.Logger,
129    ):
130        """
131        :param stream: The underlying AbstractStream
132        """
133        self._abstract_stream = stream
134        self._legacy_stream = legacy_stream
135        self._cursor = cursor
136        self._slice_logger = slice_logger
137        self._logger = logger

Parameters

stream: The underlying AbstractStream

@classmethod

def create_from_stream( cls, stream: Stream, source: AbstractSource, logger: logging.Logger, state: Optional[MutableMapping[str, Any]], cursor: Cursor) -> Stream: View Source

 61    @classmethod
 62    def create_from_stream(
 63        cls,
 64        stream: Stream,
 65        source: AbstractSource,
 66        logger: logging.Logger,
 67        state: Optional[MutableMapping[str, Any]],
 68        cursor: Cursor,
 69    ) -> Stream:
 70        """
 71        Create a ConcurrentStream from a Stream object.
 72        :param source: The source
 73        :param stream: The stream
 74        :param max_workers: The maximum number of worker thread to use
 75        :return:
 76        """
 77        pk = get_primary_key_from_stream(stream.primary_key)
 78        cursor_field = get_cursor_field_from_stream(stream)
 79
 80        if not source.message_repository:
 81            raise ValueError(
 82                "A message repository is required to emit non-record messages. Please set the message repository on the source."
 83            )
 84
 85        message_repository = source.message_repository
 86        return StreamFacade(
 87            DefaultStream(
 88                partition_generator=StreamPartitionGenerator(
 89                    stream,
 90                    message_repository,
 91                    SyncMode.full_refresh
 92                    if isinstance(cursor, FinalStateCursor)
 93                    else SyncMode.incremental,
 94                    [cursor_field] if cursor_field is not None else None,
 95                    state,
 96                ),
 97                name=stream.name,
 98                namespace=stream.namespace,
 99                json_schema=stream.get_json_schema(),
100                primary_key=pk,
101                cursor_field=cursor_field,
102                logger=logger,
103                cursor=cursor,
104            ),
105            stream,
106            cursor,
107            slice_logger=source._slice_logger,
108            logger=logger,
109        )

Create a ConcurrentStream from a Stream object.

Parameters

source: The source
stream: The stream
max_workers: The maximum number of worker thread to use

Returns

state: MutableMapping[str, Any] View Source

111    @property
112    def state(self) -> MutableMapping[str, Any]:
113        raise NotImplementedError(
114            "This should not be called as part of the Concurrent CDK code. Please report the problem to Airbyte"
115        )

def get_underlying_stream( self) -> airbyte_cdk.sources.streams.concurrent.default_stream.DefaultStream: View Source

214    def get_underlying_stream(self) -> DefaultStream:
215        return self._abstract_stream

Return the underlying stream facade object.

Inherited Members

Stream: logger; transformer; cursor; has_multiple_slices; name; get_error_display_message; read; read_only_records; read_records; get_json_schema; as_airbyte_stream; supports_incremental; is_resumable; cursor_field; namespace; source_defined_cursor; exit_on_rate_limit; primary_key; stream_slices; state_checkpoint_interval; get_updated_state; get_cursor; log_stream_sync_configuration; configured_json_schema

def create_connector_config_control_message( config: MutableMapping[str, Any]) -> AirbyteMessage: View Source

 99def create_connector_config_control_message(config: MutableMapping[str, Any]) -> AirbyteMessage:
100    control_message = AirbyteControlMessage(
101        type=OrchestratorType.CONNECTOR_CONFIG,
102        emitted_at=time.time() * 1000,
103        connectorConfig=AirbyteControlConnectorConfigMessage(config=config),
104    )
105    return AirbyteMessage(type=Type.CONTROL, control=control_message)

def emit_configuration_as_airbyte_control_message(config: MutableMapping[str, Any]) -> None: View Source

90def emit_configuration_as_airbyte_control_message(config: MutableMapping[str, Any]) -> None:
91    """
92    WARNING: deprecated - emit_configuration_as_airbyte_control_message is being deprecated in favor of the MessageRepository mechanism.
93    See the airbyte_cdk.sources.message package
94    """
95    airbyte_message = create_connector_config_control_message(config)
96    print(orjson.dumps(AirbyteMessageSerializer.dump(airbyte_message)).decode())

WARNING: deprecated - emit_configuration_as_airbyte_control_message is being deprecated in favor of the MessageRepository mechanism. See the airbyte_cdk.sources.message package

class AbstractSource(airbyte_cdk.connector.DefaultConnectorMixin, airbyte_cdk.sources.source.BaseSource[typing.Mapping[str, typing.Any], typing.List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage], airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog], abc.ABC): View Source

 53class AbstractSource(Source, ABC):
 54    """
 55    Abstract base class for an Airbyte Source. Consumers should implement any abstract methods
 56    in this class to create an Airbyte Specification compliant Source.
 57    """
 58
 59    @abstractmethod
 60    def check_connection(
 61        self, logger: logging.Logger, config: Mapping[str, Any]
 62    ) -> Tuple[bool, Optional[Any]]:
 63        """
 64        :param logger: source logger
 65        :param config: The user-provided configuration as specified by the source's spec.
 66          This usually contains information required to check connection e.g. tokens, secrets and keys etc.
 67        :return: A tuple of (boolean, error). If boolean is true, then the connection check is successful
 68          and we can connect to the underlying data source using the provided configuration.
 69          Otherwise, the input config cannot be used to connect to the underlying data source,
 70          and the "error" object should describe what went wrong.
 71          The error object will be cast to string to display the problem to the user.
 72        """
 73
 74    @abstractmethod
 75    def streams(self, config: Mapping[str, Any]) -> List[Stream]:
 76        """
 77        :param config: The user-provided configuration as specified by the source's spec.
 78        Any stream construction related operation should happen here.
 79        :return: A list of the streams in this source connector.
 80        """
 81
 82    # Stream name to instance map for applying output object transformation
 83    _stream_to_instance_map: Dict[str, Stream] = {}
 84    _slice_logger: SliceLogger = DebugSliceLogger()
 85
 86    def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog:
 87        """Implements the Discover operation from the Airbyte Specification.
 88        See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#discover.
 89        """
 90        streams = [stream.as_airbyte_stream() for stream in self.streams(config=config)]
 91        return AirbyteCatalog(streams=streams)
 92
 93    def check(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus:
 94        """Implements the Check Connection operation from the Airbyte Specification.
 95        See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check.
 96        """
 97        check_succeeded, error = self.check_connection(logger, config)
 98        if not check_succeeded:
 99            return AirbyteConnectionStatus(status=Status.FAILED, message=repr(error))
100        return AirbyteConnectionStatus(status=Status.SUCCEEDED)
101
102    def read(
103        self,
104        logger: logging.Logger,
105        config: Mapping[str, Any],
106        catalog: ConfiguredAirbyteCatalog,
107        state: Optional[List[AirbyteStateMessage]] = None,
108    ) -> Iterator[AirbyteMessage]:
109        """Implements the Read operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/."""
110        logger.info(f"Starting syncing {self.name}")
111        config, internal_config = split_config(config)
112        # TODO assert all streams exist in the connector
113        # get the streams once in case the connector needs to make any queries to generate them
114        stream_instances = {s.name: s for s in self.streams(config)}
115        state_manager = ConnectorStateManager(state=state)
116        self._stream_to_instance_map = stream_instances
117
118        stream_name_to_exception: MutableMapping[str, AirbyteTracedException] = {}
119
120        with create_timer(self.name) as timer:
121            for configured_stream in catalog.streams:
122                stream_instance = stream_instances.get(configured_stream.stream.name)
123                is_stream_exist = bool(stream_instance)
124                try:
125                    # Used direct reference to `stream_instance` instead of `is_stream_exist` to avoid mypy type checking errors
126                    if not stream_instance:
127                        if not self.raise_exception_on_missing_stream:
128                            yield stream_status_as_airbyte_message(
129                                configured_stream.stream, AirbyteStreamStatus.INCOMPLETE
130                            )
131                            continue
132
133                        error_message = (
134                            f"The stream '{configured_stream.stream.name}' in your connection configuration was not found in the source. "
135                            f"Refresh the schema in your replication settings and remove this stream from future sync attempts."
136                        )
137
138                        # Use configured_stream as stream_instance to support references in error handling.
139                        stream_instance = configured_stream.stream
140
141                        raise AirbyteTracedException(
142                            message="A stream listed in your configuration was not found in the source. Please check the logs for more "
143                            "details.",
144                            internal_message=error_message,
145                            failure_type=FailureType.config_error,
146                        )
147
148                    timer.start_event(f"Syncing stream {configured_stream.stream.name}")
149                    logger.info(f"Marking stream {configured_stream.stream.name} as STARTED")
150                    yield stream_status_as_airbyte_message(
151                        configured_stream.stream, AirbyteStreamStatus.STARTED
152                    )
153                    yield from self._read_stream(
154                        logger=logger,
155                        stream_instance=stream_instance,
156                        configured_stream=configured_stream,
157                        state_manager=state_manager,
158                        internal_config=internal_config,
159                    )
160                    logger.info(f"Marking stream {configured_stream.stream.name} as STOPPED")
161                    yield stream_status_as_airbyte_message(
162                        configured_stream.stream, AirbyteStreamStatus.COMPLETE
163                    )
164
165                except Exception as e:
166                    yield from self._emit_queued_messages()
167                    logger.exception(
168                        f"Encountered an exception while reading stream {configured_stream.stream.name}"
169                    )
170                    logger.info(f"Marking stream {configured_stream.stream.name} as STOPPED")
171                    yield stream_status_as_airbyte_message(
172                        configured_stream.stream, AirbyteStreamStatus.INCOMPLETE
173                    )
174
175                    stream_descriptor = StreamDescriptor(name=configured_stream.stream.name)
176
177                    if isinstance(e, AirbyteTracedException):
178                        traced_exception = e
179                        info_message = f"Stopping sync on error from stream {configured_stream.stream.name} because {self.name} does not support continuing syncs on error."
180                    else:
181                        traced_exception = self._serialize_exception(
182                            stream_descriptor, e, stream_instance=stream_instance
183                        )
184                        info_message = f"{self.name} does not support continuing syncs on error from stream {configured_stream.stream.name}"
185
186                    yield traced_exception.as_sanitized_airbyte_message(
187                        stream_descriptor=stream_descriptor
188                    )
189                    stream_name_to_exception[stream_instance.name] = traced_exception  # type: ignore # use configured_stream if stream_instance is None
190                    if self.stop_sync_on_stream_failure:
191                        logger.info(info_message)
192                        break
193                finally:
194                    # Finish read event only if the stream instance exists;
195                    # otherwise, there's no need as it never started
196                    if is_stream_exist:
197                        timer.finish_event()
198                        logger.info(f"Finished syncing {configured_stream.stream.name}")
199                        logger.info(timer.report())
200
201        if len(stream_name_to_exception) > 0:
202            error_message = generate_failed_streams_error_message(
203                {key: [value] for key, value in stream_name_to_exception.items()}
204            )
205            logger.info(error_message)
206            # We still raise at least one exception when a stream raises an exception because the platform currently relies
207            # on a non-zero exit code to determine if a sync attempt has failed. We also raise the exception as a config_error
208            # type because this combined error isn't actionable, but rather the previously emitted individual errors.
209            raise AirbyteTracedException(
210                message=error_message, failure_type=FailureType.config_error
211            )
212        logger.info(f"Finished syncing {self.name}")
213
214    @staticmethod
215    def _serialize_exception(
216        stream_descriptor: StreamDescriptor, e: Exception, stream_instance: Optional[Stream] = None
217    ) -> AirbyteTracedException:
218        display_message = stream_instance.get_error_display_message(e) if stream_instance else None
219        if display_message:
220            return AirbyteTracedException.from_exception(
221                e, message=display_message, stream_descriptor=stream_descriptor
222            )
223        return AirbyteTracedException.from_exception(e, stream_descriptor=stream_descriptor)
224
225    @property
226    def raise_exception_on_missing_stream(self) -> bool:
227        return False
228
229    def _read_stream(
230        self,
231        logger: logging.Logger,
232        stream_instance: Stream,
233        configured_stream: ConfiguredAirbyteStream,
234        state_manager: ConnectorStateManager,
235        internal_config: InternalConfig,
236    ) -> Iterator[AirbyteMessage]:
237        if internal_config.page_size and isinstance(stream_instance, HttpStream):
238            logger.info(
239                f"Setting page size for {stream_instance.name} to {internal_config.page_size}"
240            )
241            stream_instance.page_size = internal_config.page_size
242        logger.debug(
243            f"Syncing configured stream: {configured_stream.stream.name}",
244            extra={
245                "sync_mode": configured_stream.sync_mode,
246                "primary_key": configured_stream.primary_key,
247                "cursor_field": configured_stream.cursor_field,
248            },
249        )
250        stream_instance.log_stream_sync_configuration()
251
252        stream_name = configured_stream.stream.name
253        stream_state = state_manager.get_stream_state(stream_name, stream_instance.namespace)
254
255        # This is a hack. Existing full refresh streams that are converted into resumable full refresh need to discard
256        # the state because the terminal state for a full refresh sync is not compatible with substream resumable full
257        # refresh state. This is only required when running live traffic regression testing since the platform normally
258        # handles whether to pass state
259        if stream_state == {"__ab_no_cursor_state_message": True}:
260            stream_state = {}
261
262        if "state" in dir(stream_instance):
263            stream_instance.state = stream_state  # type: ignore # we check that state in the dir(stream_instance)
264            logger.info(f"Setting state of {self.name} stream to {stream_state}")
265
266        record_iterator = stream_instance.read(
267            configured_stream,
268            logger,
269            self._slice_logger,
270            stream_state,
271            state_manager,
272            internal_config,
273        )
274
275        record_counter = 0
276        logger.info(f"Syncing stream: {stream_name} ")
277        for record_data_or_message in record_iterator:
278            record = self._get_message(record_data_or_message, stream_instance)
279            if record.type == MessageType.RECORD:
280                record_counter += 1
281                if record_counter == 1:
282                    logger.info(f"Marking stream {stream_name} as RUNNING")
283                    # If we just read the first record of the stream, emit the transition to the RUNNING state
284                    yield stream_status_as_airbyte_message(
285                        configured_stream.stream, AirbyteStreamStatus.RUNNING
286                    )
287            yield from self._emit_queued_messages()
288            yield record
289
290        logger.info(f"Read {record_counter} records from {stream_name} stream")
291
292    def _emit_queued_messages(self) -> Iterable[AirbyteMessage]:
293        if self.message_repository:
294            yield from self.message_repository.consume_queue()
295        return
296
297    def _get_message(
298        self, record_data_or_message: Union[StreamData, AirbyteMessage], stream: Stream
299    ) -> AirbyteMessage:
300        """
301        Converts the input to an AirbyteMessage if it is a StreamData. Returns the input as is if it is already an AirbyteMessage
302        """
303        match record_data_or_message:
304            case AirbyteMessage():
305                return record_data_or_message
306            case _:
307                return stream_data_to_airbyte_message(
308                    stream.name,
309                    record_data_or_message,
310                    stream.transformer,
311                    stream.get_json_schema(),
312                )
313
314    @property
315    def message_repository(self) -> Union[None, MessageRepository]:
316        return _default_message_repository
317
318    @property
319    def stop_sync_on_stream_failure(self) -> bool:
320        """
321        WARNING: This function is in-development which means it is subject to change. Use at your own risk.
322
323        By default, when a source encounters an exception while syncing a stream, it will emit an error trace message and then
324        continue syncing the next stream. This can be overwritten on a per-source basis so that the source will stop the sync
325        on the first error seen and emit a single error trace message for that stream.
326        """
327        return False

Abstract base class for an Airbyte Source. Consumers should implement any abstract methods in this class to create an Airbyte Specification compliant Source.

@abstractmethod

def check_connection( self, logger: logging.Logger, config: Mapping[str, Any]) -> Tuple[bool, Optional[Any]]: View Source

59    @abstractmethod
60    def check_connection(
61        self, logger: logging.Logger, config: Mapping[str, Any]
62    ) -> Tuple[bool, Optional[Any]]:
63        """
64        :param logger: source logger
65        :param config: The user-provided configuration as specified by the source's spec.
66          This usually contains information required to check connection e.g. tokens, secrets and keys etc.
67        :return: A tuple of (boolean, error). If boolean is true, then the connection check is successful
68          and we can connect to the underlying data source using the provided configuration.
69          Otherwise, the input config cannot be used to connect to the underlying data source,
70          and the "error" object should describe what went wrong.
71          The error object will be cast to string to display the problem to the user.
72        """

Parameters

logger: source logger
config: The user-provided configuration as specified by the source's spec. This usually contains information required to check connection e.g. tokens, secrets and keys etc.

Returns

A tuple of (boolean, error). If boolean is true, then the connection check is successful and we can connect to the underlying data source using the provided configuration. Otherwise, the input config cannot be used to connect to the underlying data source, and the "error" object should describe what went wrong. The error object will be cast to string to display the problem to the user.

@abstractmethod

def streams( self, config: Mapping[str, Any]) -> List[Stream]: View Source

74    @abstractmethod
75    def streams(self, config: Mapping[str, Any]) -> List[Stream]:
76        """
77        :param config: The user-provided configuration as specified by the source's spec.
78        Any stream construction related operation should happen here.
79        :return: A list of the streams in this source connector.
80        """

Parameters

config: The user-provided configuration as specified by the source's spec. Any stream construction related operation should happen here.

Returns

A list of the streams in this source connector.

def discover( self, logger: logging.Logger, config: Mapping[str, Any]) -> airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteCatalog: View Source

86    def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog:
87        """Implements the Discover operation from the Airbyte Specification.
88        See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#discover.
89        """
90        streams = [stream.as_airbyte_stream() for stream in self.streams(config=config)]
91        return AirbyteCatalog(streams=streams)

Implements the Discover operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#discover.

def check( self, logger: logging.Logger, config: Mapping[str, Any]) -> airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteConnectionStatus: View Source

 93    def check(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus:
 94        """Implements the Check Connection operation from the Airbyte Specification.
 95        See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check.
 96        """
 97        check_succeeded, error = self.check_connection(logger, config)
 98        if not check_succeeded:
 99            return AirbyteConnectionStatus(status=Status.FAILED, message=repr(error))
100        return AirbyteConnectionStatus(status=Status.SUCCEEDED)

Implements the Check Connection operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check.

102    def read(
103        self,
104        logger: logging.Logger,
105        config: Mapping[str, Any],
106        catalog: ConfiguredAirbyteCatalog,
107        state: Optional[List[AirbyteStateMessage]] = None,
108    ) -> Iterator[AirbyteMessage]:
109        """Implements the Read operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/."""
110        logger.info(f"Starting syncing {self.name}")
111        config, internal_config = split_config(config)
112        # TODO assert all streams exist in the connector
113        # get the streams once in case the connector needs to make any queries to generate them
114        stream_instances = {s.name: s for s in self.streams(config)}
115        state_manager = ConnectorStateManager(state=state)
116        self._stream_to_instance_map = stream_instances
117
118        stream_name_to_exception: MutableMapping[str, AirbyteTracedException] = {}
119
120        with create_timer(self.name) as timer:
121            for configured_stream in catalog.streams:
122                stream_instance = stream_instances.get(configured_stream.stream.name)
123                is_stream_exist = bool(stream_instance)
124                try:
125                    # Used direct reference to `stream_instance` instead of `is_stream_exist` to avoid mypy type checking errors
126                    if not stream_instance:
127                        if not self.raise_exception_on_missing_stream:
128                            yield stream_status_as_airbyte_message(
129                                configured_stream.stream, AirbyteStreamStatus.INCOMPLETE
130                            )
131                            continue
132
133                        error_message = (
134                            f"The stream '{configured_stream.stream.name}' in your connection configuration was not found in the source. "
135                            f"Refresh the schema in your replication settings and remove this stream from future sync attempts."
136                        )
137
138                        # Use configured_stream as stream_instance to support references in error handling.
139                        stream_instance = configured_stream.stream
140
141                        raise AirbyteTracedException(
142                            message="A stream listed in your configuration was not found in the source. Please check the logs for more "
143                            "details.",
144                            internal_message=error_message,
145                            failure_type=FailureType.config_error,
146                        )
147
148                    timer.start_event(f"Syncing stream {configured_stream.stream.name}")
149                    logger.info(f"Marking stream {configured_stream.stream.name} as STARTED")
150                    yield stream_status_as_airbyte_message(
151                        configured_stream.stream, AirbyteStreamStatus.STARTED
152                    )
153                    yield from self._read_stream(
154                        logger=logger,
155                        stream_instance=stream_instance,
156                        configured_stream=configured_stream,
157                        state_manager=state_manager,
158                        internal_config=internal_config,
159                    )
160                    logger.info(f"Marking stream {configured_stream.stream.name} as STOPPED")
161                    yield stream_status_as_airbyte_message(
162                        configured_stream.stream, AirbyteStreamStatus.COMPLETE
163                    )
164
165                except Exception as e:
166                    yield from self._emit_queued_messages()
167                    logger.exception(
168                        f"Encountered an exception while reading stream {configured_stream.stream.name}"
169                    )
170                    logger.info(f"Marking stream {configured_stream.stream.name} as STOPPED")
171                    yield stream_status_as_airbyte_message(
172                        configured_stream.stream, AirbyteStreamStatus.INCOMPLETE
173                    )
174
175                    stream_descriptor = StreamDescriptor(name=configured_stream.stream.name)
176
177                    if isinstance(e, AirbyteTracedException):
178                        traced_exception = e
179                        info_message = f"Stopping sync on error from stream {configured_stream.stream.name} because {self.name} does not support continuing syncs on error."
180                    else:
181                        traced_exception = self._serialize_exception(
182                            stream_descriptor, e, stream_instance=stream_instance
183                        )
184                        info_message = f"{self.name} does not support continuing syncs on error from stream {configured_stream.stream.name}"
185
186                    yield traced_exception.as_sanitized_airbyte_message(
187                        stream_descriptor=stream_descriptor
188                    )
189                    stream_name_to_exception[stream_instance.name] = traced_exception  # type: ignore # use configured_stream if stream_instance is None
190                    if self.stop_sync_on_stream_failure:
191                        logger.info(info_message)
192                        break
193                finally:
194                    # Finish read event only if the stream instance exists;
195                    # otherwise, there's no need as it never started
196                    if is_stream_exist:
197                        timer.finish_event()
198                        logger.info(f"Finished syncing {configured_stream.stream.name}")
199                        logger.info(timer.report())
200
201        if len(stream_name_to_exception) > 0:
202            error_message = generate_failed_streams_error_message(
203                {key: [value] for key, value in stream_name_to_exception.items()}
204            )
205            logger.info(error_message)
206            # We still raise at least one exception when a stream raises an exception because the platform currently relies
207            # on a non-zero exit code to determine if a sync attempt has failed. We also raise the exception as a config_error
208            # type because this combined error isn't actionable, but rather the previously emitted individual errors.
209            raise AirbyteTracedException(
210                message=error_message, failure_type=FailureType.config_error
211            )
212        logger.info(f"Finished syncing {self.name}")

Implements the Read operation from the Airbyte Specification. See https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/.

raise_exception_on_missing_stream: bool View Source

225    @property
226    def raise_exception_on_missing_stream(self) -> bool:
227        return False

message_repository: Optional[MessageRepository] View Source

314    @property
315    def message_repository(self) -> Union[None, MessageRepository]:
316        return _default_message_repository

stop_sync_on_stream_failure: bool View Source

318    @property
319    def stop_sync_on_stream_failure(self) -> bool:
320        """
321        WARNING: This function is in-development which means it is subject to change. Use at your own risk.
322
323        By default, when a source encounters an exception while syncing a stream, it will emit an error trace message and then
324        continue syncing the next stream. This can be overwritten on a per-source basis so that the source will stop the sync
325        on the first error seen and emit a single error trace message for that stream.
326        """
327        return False

WARNING: This function is in-development which means it is subject to change. Use at your own risk.

By default, when a source encounters an exception while syncing a stream, it will emit an error trace message and then continue syncing the next stream. This can be overwritten on a per-source basis so that the source will stop the sync on the first error seen and emit a single error trace message for that stream.

Inherited Members

airbyte_cdk.connector.DefaultConnectorMixin: configure
Source: read_state; read_catalog; name
BaseConnector: check_config_against_spec; read_config; write_config; spec

class BaseConfig(pydantic.v1.main.BaseModel): View Source

13class BaseConfig(BaseModel):
14    """Base class for connector spec, adds the following behaviour:
15
16    - resolve $ref and replace it with definition
17    - replace all occurrences of anyOf with oneOf
18    - drop description
19    """
20
21    @classmethod
22    def schema(cls, *args: Any, **kwargs: Any) -> Dict[str, Any]:
23        """We're overriding the schema classmethod to enable some post-processing"""
24        schema = super().schema(*args, **kwargs)
25        rename_key(schema, old_key="anyOf", new_key="oneOf")  # UI supports only oneOf
26        expand_refs(schema)
27        schema.pop("description", None)  # description added from the docstring
28        return schema

Base class for connector spec, adds the following behaviour:

resolve $ref and replace it with definition
replace all occurrences of anyOf with oneOf
drop description

@classmethod

def schema(cls, *args: Any, **kwargs: Any) -> Dict[str, Any]: View Source

21    @classmethod
22    def schema(cls, *args: Any, **kwargs: Any) -> Dict[str, Any]:
23        """We're overriding the schema classmethod to enable some post-processing"""
24        schema = super().schema(*args, **kwargs)
25        rename_key(schema, old_key="anyOf", new_key="oneOf")  # UI supports only oneOf
26        expand_refs(schema)
27        schema.pop("description", None)  # description added from the docstring
28        return schema

We're overriding the schema classmethod to enable some post-processing

class BaseConnector(abc.ABC, typing.Generic[~TConfig]): View Source

 34class BaseConnector(ABC, Generic[TConfig]):
 35    # configure whether the `check_config_against_spec_or_exit()` needs to be called
 36    check_config_against_spec: bool = True
 37
 38    @abstractmethod
 39    def configure(self, config: Mapping[str, Any], temp_dir: str) -> TConfig:
 40        """
 41        Persist config in temporary directory to run the Source job
 42        """
 43
 44    @staticmethod
 45    def read_config(config_path: str) -> MutableMapping[str, Any]:
 46        config = BaseConnector._read_json_file(config_path)
 47        if isinstance(config, MutableMapping):
 48            return config
 49        else:
 50            raise ValueError(
 51                f"The content of {config_path} is not an object and therefore is not a valid config. Please ensure the file represent a config."
 52            )
 53
 54    @staticmethod
 55    def _read_json_file(file_path: str) -> Any:
 56        with open(file_path, "r") as file:
 57            contents = file.read()
 58
 59        try:
 60            return json.loads(contents)
 61        except json.JSONDecodeError as error:
 62            raise ValueError(
 63                f"Could not read json file {file_path}: {error}. Please ensure that it is a valid JSON."
 64            )
 65
 66    @staticmethod
 67    def write_config(config: TConfig, config_path: str) -> None:
 68        with open(config_path, "w") as fh:
 69            fh.write(json.dumps(config))
 70
 71    def spec(self, logger: logging.Logger) -> ConnectorSpecification:
 72        """
 73        Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username and password)
 74        required to run this integration. By default, this will be loaded from a "spec.yaml" or a "spec.json" in the package root.
 75        """
 76
 77        package = self.__class__.__module__.split(".")[0]
 78
 79        yaml_spec = load_optional_package_file(package, "spec.yaml")
 80        json_spec = load_optional_package_file(package, "spec.json")
 81
 82        if yaml_spec and json_spec:
 83            raise RuntimeError(
 84                "Found multiple spec files in the package. Only one of spec.yaml or spec.json should be provided."
 85            )
 86
 87        if yaml_spec:
 88            spec_obj = yaml.load(yaml_spec, Loader=yaml.SafeLoader)
 89        elif json_spec:
 90            try:
 91                spec_obj = json.loads(json_spec)
 92            except json.JSONDecodeError as error:
 93                raise ValueError(
 94                    f"Could not read json spec file: {error}. Please ensure that it is a valid JSON."
 95                )
 96        else:
 97            raise FileNotFoundError("Unable to find spec.yaml or spec.json in the package.")
 98
 99        return ConnectorSpecificationSerializer.load(spec_obj)
100
101    @abstractmethod
102    def check(self, logger: logging.Logger, config: TConfig) -> AirbyteConnectionStatus:
103        """
104        Tests if the input configuration can be used to successfully connect to the integration e.g: if a provided Stripe API token can be used to connect
105        to the Stripe API.
106        """

Helper class that provides a standard way to create an ABC using inheritance.

check_config_against_spec: bool = True

@abstractmethod

def configure(self, config: Mapping[str, Any], temp_dir: str) -> ~TConfig: View Source

38    @abstractmethod
39    def configure(self, config: Mapping[str, Any], temp_dir: str) -> TConfig:
40        """
41        Persist config in temporary directory to run the Source job
42        """

Persist config in temporary directory to run the Source job

@staticmethod

def read_config(config_path: str) -> MutableMapping[str, Any]: View Source

44    @staticmethod
45    def read_config(config_path: str) -> MutableMapping[str, Any]:
46        config = BaseConnector._read_json_file(config_path)
47        if isinstance(config, MutableMapping):
48            return config
49        else:
50            raise ValueError(
51                f"The content of {config_path} is not an object and therefore is not a valid config. Please ensure the file represent a config."
52            )

@staticmethod

def write_config(config: ~TConfig, config_path: str) -> None: View Source

66    @staticmethod
67    def write_config(config: TConfig, config_path: str) -> None:
68        with open(config_path, "w") as fh:
69            fh.write(json.dumps(config))

def spec( self, logger: logging.Logger) -> airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification: View Source

71    def spec(self, logger: logging.Logger) -> ConnectorSpecification:
72        """
73        Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username and password)
74        required to run this integration. By default, this will be loaded from a "spec.yaml" or a "spec.json" in the package root.
75        """
76
77        package = self.__class__.__module__.split(".")[0]
78
79        yaml_spec = load_optional_package_file(package, "spec.yaml")
80        json_spec = load_optional_package_file(package, "spec.json")
81
82        if yaml_spec and json_spec:
83            raise RuntimeError(
84                "Found multiple spec files in the package. Only one of spec.yaml or spec.json should be provided."
85            )
86
87        if yaml_spec:
88            spec_obj = yaml.load(yaml_spec, Loader=yaml.SafeLoader)
89        elif json_spec:
90            try:
91                spec_obj = json.loads(json_spec)
92            except json.JSONDecodeError as error:
93                raise ValueError(
94                    f"Could not read json spec file: {error}. Please ensure that it is a valid JSON."
95                )
96        else:
97            raise FileNotFoundError("Unable to find spec.yaml or spec.json in the package.")
98
99        return ConnectorSpecificationSerializer.load(spec_obj)

Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username and password) required to run this integration. By default, this will be loaded from a "spec.yaml" or a "spec.json" in the package root.

@abstractmethod

def check( self, logger: logging.Logger, config: ~TConfig) -> airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteConnectionStatus: View Source

101    @abstractmethod
102    def check(self, logger: logging.Logger, config: TConfig) -> AirbyteConnectionStatus:
103        """
104        Tests if the input configuration can be used to successfully connect to the integration e.g: if a provided Stripe API token can be used to connect
105        to the Stripe API.
106        """

Tests if the input configuration can be used to successfully connect to the integration e.g: if a provided Stripe API token can be used to connect to the Stripe API.

class Connector(airbyte_cdk.connector.DefaultConnectorMixin, airbyte_cdk.BaseConnector[typing.Mapping[str, typing.Any]], abc.ABC): View Source

124class Connector(DefaultConnectorMixin, BaseConnector[Mapping[str, Any]], ABC): ...

Helper class that provides a standard way to create an ABC using inheritance.

Inherited Members

airbyte_cdk.connector.DefaultConnectorMixin: configure
BaseConnector: check_config_against_spec; read_config; write_config; spec; check

class Destination(airbyte_cdk.connector.DefaultConnectorMixin, airbyte_cdk.BaseConnector[typing.Mapping[str, typing.Any]], abc.ABC): View Source

 30class Destination(Connector, ABC):
 31    VALID_CMDS = {"spec", "check", "write"}
 32
 33    @abstractmethod
 34    def write(
 35        self,
 36        config: Mapping[str, Any],
 37        configured_catalog: ConfiguredAirbyteCatalog,
 38        input_messages: Iterable[AirbyteMessage],
 39    ) -> Iterable[AirbyteMessage]:
 40        """Implement to define how the connector writes data to the destination"""
 41
 42    def _run_check(self, config: Mapping[str, Any]) -> AirbyteMessage:
 43        check_result = self.check(logger, config)
 44        return AirbyteMessage(type=Type.CONNECTION_STATUS, connectionStatus=check_result)
 45
 46    def _parse_input_stream(self, input_stream: io.TextIOWrapper) -> Iterable[AirbyteMessage]:
 47        """Reads from stdin, converting to Airbyte messages"""
 48        for line in input_stream:
 49            try:
 50                yield AirbyteMessageSerializer.load(orjson.loads(line))
 51            except orjson.JSONDecodeError:
 52                logger.info(
 53                    f"ignoring input which can't be deserialized as Airbyte Message: {line}"
 54                )
 55
 56    def _run_write(
 57        self,
 58        config: Mapping[str, Any],
 59        configured_catalog_path: str,
 60        input_stream: io.TextIOWrapper,
 61    ) -> Iterable[AirbyteMessage]:
 62        catalog = ConfiguredAirbyteCatalogSerializer.load(
 63            orjson.loads(open(configured_catalog_path).read())
 64        )
 65        input_messages = self._parse_input_stream(input_stream)
 66        logger.info("Begin writing to the destination...")
 67        yield from self.write(
 68            config=config, configured_catalog=catalog, input_messages=input_messages
 69        )
 70        logger.info("Writing complete.")
 71
 72    def parse_args(self, args: List[str]) -> argparse.Namespace:
 73        """
 74        :param args: commandline arguments
 75        :return:
 76        """
 77
 78        parent_parser = argparse.ArgumentParser(add_help=False)
 79        main_parser = argparse.ArgumentParser()
 80        subparsers = main_parser.add_subparsers(title="commands", dest="command")
 81
 82        # spec
 83        subparsers.add_parser(
 84            "spec", help="outputs the json configuration specification", parents=[parent_parser]
 85        )
 86
 87        # check
 88        check_parser = subparsers.add_parser(
 89            "check", help="checks the config can be used to connect", parents=[parent_parser]
 90        )
 91        required_check_parser = check_parser.add_argument_group("required named arguments")
 92        required_check_parser.add_argument(
 93            "--config", type=str, required=True, help="path to the json configuration file"
 94        )
 95
 96        # write
 97        write_parser = subparsers.add_parser(
 98            "write", help="Writes data to the destination", parents=[parent_parser]
 99        )
100        write_required = write_parser.add_argument_group("required named arguments")
101        write_required.add_argument(
102            "--config", type=str, required=True, help="path to the JSON configuration file"
103        )
104        write_required.add_argument(
105            "--catalog", type=str, required=True, help="path to the configured catalog JSON file"
106        )
107
108        parsed_args = main_parser.parse_args(args)
109        cmd = parsed_args.command
110        if not cmd:
111            raise Exception("No command entered. ")
112        elif cmd not in ["spec", "check", "write"]:
113            # This is technically dead code since parse_args() would fail if this was the case
114            # But it's non-obvious enough to warrant placing it here anyways
115            raise Exception(f"Unknown command entered: {cmd}")
116
117        return parsed_args
118
119    def run_cmd(self, parsed_args: argparse.Namespace) -> Iterable[AirbyteMessage]:
120        cmd = parsed_args.command
121        if cmd not in self.VALID_CMDS:
122            raise Exception(f"Unrecognized command: {cmd}")
123
124        spec = self.spec(logger)
125        if cmd == "spec":
126            yield AirbyteMessage(type=Type.SPEC, spec=spec)
127            return
128        config = self.read_config(config_path=parsed_args.config)
129        if self.check_config_against_spec or cmd == "check":
130            try:
131                check_config_against_spec_or_exit(config, spec)
132            except AirbyteTracedException as traced_exc:
133                connection_status = traced_exc.as_connection_status_message()
134                if connection_status and cmd == "check":
135                    yield connection_status
136                    return
137                raise traced_exc
138
139        if cmd == "check":
140            yield self._run_check(config=config)
141        elif cmd == "write":
142            # Wrap in UTF-8 to override any other input encodings
143            wrapped_stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8")
144            yield from self._run_write(
145                config=config,
146                configured_catalog_path=parsed_args.catalog,
147                input_stream=wrapped_stdin,
148            )
149
150    def run(self, args: List[str]) -> None:
151        init_uncaught_exception_handler(logger)
152        parsed_args = self.parse_args(args)
153        output_messages = self.run_cmd(parsed_args)
154        for message in output_messages:
155            print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())

Helper class that provides a standard way to create an ABC using inheritance.

VALID_CMDS = {'spec', 'check', 'write'}

@abstractmethod

def write( self, config: Mapping[str, Any], configured_catalog: airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog, input_messages: Iterable[AirbyteMessage]) -> Iterable[AirbyteMessage]: View Source

33    @abstractmethod
34    def write(
35        self,
36        config: Mapping[str, Any],
37        configured_catalog: ConfiguredAirbyteCatalog,
38        input_messages: Iterable[AirbyteMessage],
39    ) -> Iterable[AirbyteMessage]:
40        """Implement to define how the connector writes data to the destination"""

Implement to define how the connector writes data to the destination

def parse_args(self, args: List[str]) -> argparse.Namespace: View Source

 72    def parse_args(self, args: List[str]) -> argparse.Namespace:
 73        """
 74        :param args: commandline arguments
 75        :return:
 76        """
 77
 78        parent_parser = argparse.ArgumentParser(add_help=False)
 79        main_parser = argparse.ArgumentParser()
 80        subparsers = main_parser.add_subparsers(title="commands", dest="command")
 81
 82        # spec
 83        subparsers.add_parser(
 84            "spec", help="outputs the json configuration specification", parents=[parent_parser]
 85        )
 86
 87        # check
 88        check_parser = subparsers.add_parser(
 89            "check", help="checks the config can be used to connect", parents=[parent_parser]
 90        )
 91        required_check_parser = check_parser.add_argument_group("required named arguments")
 92        required_check_parser.add_argument(
 93            "--config", type=str, required=True, help="path to the json configuration file"
 94        )
 95
 96        # write
 97        write_parser = subparsers.add_parser(
 98            "write", help="Writes data to the destination", parents=[parent_parser]
 99        )
100        write_required = write_parser.add_argument_group("required named arguments")
101        write_required.add_argument(
102            "--config", type=str, required=True, help="path to the JSON configuration file"
103        )
104        write_required.add_argument(
105            "--catalog", type=str, required=True, help="path to the configured catalog JSON file"
106        )
107
108        parsed_args = main_parser.parse_args(args)
109        cmd = parsed_args.command
110        if not cmd:
111            raise Exception("No command entered. ")
112        elif cmd not in ["spec", "check", "write"]:
113            # This is technically dead code since parse_args() would fail if this was the case
114            # But it's non-obvious enough to warrant placing it here anyways
115            raise Exception(f"Unknown command entered: {cmd}")
116
117        return parsed_args

Parameters

args: commandline arguments

Returns

def run_cmd( self, parsed_args: argparse.Namespace) -> Iterable[AirbyteMessage]: View Source

119    def run_cmd(self, parsed_args: argparse.Namespace) -> Iterable[AirbyteMessage]:
120        cmd = parsed_args.command
121        if cmd not in self.VALID_CMDS:
122            raise Exception(f"Unrecognized command: {cmd}")
123
124        spec = self.spec(logger)
125        if cmd == "spec":
126            yield AirbyteMessage(type=Type.SPEC, spec=spec)
127            return
128        config = self.read_config(config_path=parsed_args.config)
129        if self.check_config_against_spec or cmd == "check":
130            try:
131                check_config_against_spec_or_exit(config, spec)
132            except AirbyteTracedException as traced_exc:
133                connection_status = traced_exc.as_connection_status_message()
134                if connection_status and cmd == "check":
135                    yield connection_status
136                    return
137                raise traced_exc
138
139        if cmd == "check":
140            yield self._run_check(config=config)
141        elif cmd == "write":
142            # Wrap in UTF-8 to override any other input encodings
143            wrapped_stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8")
144            yield from self._run_write(
145                config=config,
146                configured_catalog_path=parsed_args.catalog,
147                input_stream=wrapped_stdin,
148            )

def run(self, args: List[str]) -> None: View Source

150    def run(self, args: List[str]) -> None:
151        init_uncaught_exception_handler(logger)
152        parsed_args = self.parse_args(args)
153        output_messages = self.run_cmd(parsed_args)
154        for message in output_messages:
155            print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())

Inherited Members

airbyte_cdk.connector.DefaultConnectorMixin: configure
BaseConnector: check_config_against_spec; read_config; write_config; spec; check

class Source(airbyte_cdk.connector.DefaultConnectorMixin, airbyte_cdk.sources.source.BaseSource[typing.Mapping[str, typing.Any], typing.List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage], airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog], abc.ABC): View Source

56class Source(
57    DefaultConnectorMixin,
58    BaseSource[Mapping[str, Any], List[AirbyteStateMessage], ConfiguredAirbyteCatalog],
59    ABC,
60):
61    # can be overridden to change an input state.
62    @classmethod
63    def read_state(cls, state_path: str) -> List[AirbyteStateMessage]:
64        """
65        Retrieves the input state of a sync by reading from the specified JSON file. Incoming state can be deserialized into either
66        a JSON object for legacy state input or as a list of AirbyteStateMessages for the per-stream state format. Regardless of the
67        incoming input type, it will always be transformed and output as a list of AirbyteStateMessage(s).
68        :param state_path: The filepath to where the stream states are located
69        :return: The complete stream state based on the connector's previous sync
70        """
71        parsed_state_messages = []
72        if state_path:
73            state_obj = BaseConnector._read_json_file(state_path)
74            if state_obj:
75                for state in state_obj:  # type: ignore  # `isinstance(state_obj, List)` ensures that this is a list
76                    parsed_message = AirbyteStateMessageSerializer.load(state)
77                    if (
78                        not parsed_message.stream
79                        and not parsed_message.data
80                        and not parsed_message.global_
81                    ):
82                        raise ValueError(
83                            "AirbyteStateMessage should contain either a stream, global, or state field"
84                        )
85                    parsed_state_messages.append(parsed_message)
86        return parsed_state_messages
87
88    # can be overridden to change an input catalog
89    @classmethod
90    def read_catalog(cls, catalog_path: str) -> ConfiguredAirbyteCatalog:
91        return ConfiguredAirbyteCatalogSerializer.load(cls._read_json_file(catalog_path))
92
93    @property
94    def name(self) -> str:
95        """Source name"""
96        return self.__class__.__name__

Helper class that provides a standard way to create an ABC using inheritance.

@classmethod

def read_state( cls, state_path: str) -> List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage]: View Source

62    @classmethod
63    def read_state(cls, state_path: str) -> List[AirbyteStateMessage]:
64        """
65        Retrieves the input state of a sync by reading from the specified JSON file. Incoming state can be deserialized into either
66        a JSON object for legacy state input or as a list of AirbyteStateMessages for the per-stream state format. Regardless of the
67        incoming input type, it will always be transformed and output as a list of AirbyteStateMessage(s).
68        :param state_path: The filepath to where the stream states are located
69        :return: The complete stream state based on the connector's previous sync
70        """
71        parsed_state_messages = []
72        if state_path:
73            state_obj = BaseConnector._read_json_file(state_path)
74            if state_obj:
75                for state in state_obj:  # type: ignore  # `isinstance(state_obj, List)` ensures that this is a list
76                    parsed_message = AirbyteStateMessageSerializer.load(state)
77                    if (
78                        not parsed_message.stream
79                        and not parsed_message.data
80                        and not parsed_message.global_
81                    ):
82                        raise ValueError(
83                            "AirbyteStateMessage should contain either a stream, global, or state field"
84                        )
85                    parsed_state_messages.append(parsed_message)
86        return parsed_state_messages

Retrieves the input state of a sync by reading from the specified JSON file. Incoming state can be deserialized into either a JSON object for legacy state input or as a list of AirbyteStateMessages for the per-stream state format. Regardless of the incoming input type, it will always be transformed and output as a list of AirbyteStateMessage(s).

Parameters

state_path: The filepath to where the stream states are located

Returns

The complete stream state based on the connector's previous sync

@classmethod

def read_catalog( cls, catalog_path: str) -> airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog: View Source

89    @classmethod
90    def read_catalog(cls, catalog_path: str) -> ConfiguredAirbyteCatalog:
91        return ConfiguredAirbyteCatalogSerializer.load(cls._read_json_file(catalog_path))

93    @property
94    def name(self) -> str:
95        """Source name"""
96        return self.__class__.__name__

Source name

Inherited Members

airbyte_cdk.connector.DefaultConnectorMixin: configure
airbyte_cdk.sources.source.BaseSource: read; discover
BaseConnector: check_config_against_spec; read_config; write_config; spec; check

@dataclass

class AddFields(airbyte_cdk.RecordTransformation): View Source

 37@dataclass
 38class AddFields(RecordTransformation):
 39    """
 40    Transformation which adds field to an output record. The path of the added field can be nested. Adding nested fields will create all
 41    necessary parent objects (like mkdir -p). Adding fields to an array will extend the array to that index (filling intermediate
 42    indices with null values). So if you add a field at index 5 to the array ["value"], it will become ["value", null, null, null, null,
 43    "new_value"].
 44
 45
 46    This transformation has access to the following contextual values:
 47        record: the record about to be output by the connector
 48        config: the input configuration provided to a connector
 49        stream_state: the current state of the stream
 50        stream_slice: the current stream slice being read
 51
 52
 53
 54    Examples of instantiating this transformation via YAML:
 55    - type: AddFields
 56      fields:
 57        # hardcoded constant
 58        - path: ["path"]
 59          value: "static_value"
 60
 61        # nested path
 62        - path: ["path", "to", "field"]
 63          value: "static"
 64
 65        # from config
 66        - path: ["shop_id"]
 67          value: "{{ config.shop_id }}"
 68
 69        # from stream_interval
 70        - path: ["date"]
 71          value: "{{ stream_interval.start_date }}"
 72
 73        # from record
 74        - path: ["unnested_value"]
 75          value: {{ record.nested.field }}
 76
 77        # from stream_slice
 78        - path: ["start_date"]
 79          value: {{ stream_slice.start_date }}
 80
 81        # by supplying any valid Jinja template directive or expression https://jinja.palletsprojects.com/en/3.1.x/templates/#
 82        - path: ["two_times_two"]
 83          value: {{ 2 * 2 }}
 84
 85    Attributes:
 86        fields (List[AddedFieldDefinition]): A list of transformations (path and corresponding value) that will be added to the record
 87    """
 88
 89    fields: List[AddedFieldDefinition]
 90    parameters: InitVar[Mapping[str, Any]]
 91    condition: str = ""
 92    _parsed_fields: List[ParsedAddFieldDefinition] = field(
 93        init=False, repr=False, default_factory=list
 94    )
 95
 96    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 97        self._filter_interpolator = InterpolatedBoolean(
 98            condition=self.condition, parameters=parameters
 99        )
100
101        for add_field in self.fields:
102            if len(add_field.path) < 1:
103                raise ValueError(
104                    f"Expected a non-zero-length path for the AddFields transformation {add_field}"
105                )
106
107            if not isinstance(add_field.value, InterpolatedString):
108                if not isinstance(add_field.value, str):
109                    raise f"Expected a string value for the AddFields transformation: {add_field}"
110                else:
111                    self._parsed_fields.append(
112                        ParsedAddFieldDefinition(
113                            add_field.path,
114                            InterpolatedString.create(add_field.value, parameters=parameters),
115                            value_type=add_field.value_type,
116                            parameters=parameters,
117                        )
118                    )
119            else:
120                self._parsed_fields.append(
121                    ParsedAddFieldDefinition(
122                        add_field.path,
123                        add_field.value,
124                        value_type=add_field.value_type,
125                        parameters={},
126                    )
127                )
128
129    def transform(
130        self,
131        record: Dict[str, Any],
132        config: Optional[Config] = None,
133        stream_state: Optional[StreamState] = None,
134        stream_slice: Optional[StreamSlice] = None,
135    ) -> None:
136        if config is None:
137            config = {}
138        kwargs = {"record": record, "stream_slice": stream_slice}
139        for parsed_field in self._parsed_fields:
140            valid_types = (parsed_field.value_type,) if parsed_field.value_type else None
141            value = parsed_field.value.eval(config, valid_types=valid_types, **kwargs)
142            is_empty_condition = not self.condition
143            if is_empty_condition or self._filter_interpolator.eval(
144                config, value=value, path=parsed_field.path, **kwargs
145            ):
146                dpath.new(record, parsed_field.path, value)
147
148    def __eq__(self, other: Any) -> bool:
149        return bool(self.__dict__ == other.__dict__)

Transformation which adds field to an output record. The path of the added field can be nested. Adding nested fields will create all necessary parent objects (like mkdir -p). Adding fields to an array will extend the array to that index (filling intermediate indices with null values). So if you add a field at index 5 to the array ["value"], it will become ["value", null, null, null, null, "new_value"].

This transformation has access to the following contextual values:

record: the record about to be output by the connector config: the input configuration provided to a connector stream_state: the current state of the stream stream_slice: the current stream slice being read

Examples of instantiating this transformation via YAML:

type: AddFields fields: # hardcoded constant
- path: ["path"] value: "static_value"

# nested path
- path: ["path", "to", "field"]
  value: "static"

# from config
- path: ["shop_id"]
  value: "{{ config.shop_id }}"

# from stream_interval
- path: ["date"]
  value: "{{ stream_interval.start_date }}"

# from record
- path: ["unnested_value"]
  value: {{ record.nested.field }}

# from stream_slice
- path: ["start_date"]
  value: {{ stream_slice.start_date }}

# by supplying any valid Jinja template directive or expression https://jinja.palletsprojects.com/en/3.1.x/templates/#
- path: ["two_times_two"]
  value: {{ 2 * 2 }}

Attributes:

fields (List[AddedFieldDefinition]): A list of transformations (path and corresponding value) that will be added to the record

AddFields( fields: List[AddedFieldDefinition], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], condition: str = '')

fields: List[AddedFieldDefinition]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

condition: str = ''

def transform( self, record: Dict[str, Any], config: Optional[Mapping[str, Any]] = None, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None) -> None: View Source

129    def transform(
130        self,
131        record: Dict[str, Any],
132        config: Optional[Config] = None,
133        stream_state: Optional[StreamState] = None,
134        stream_slice: Optional[StreamSlice] = None,
135    ) -> None:
136        if config is None:
137            config = {}
138        kwargs = {"record": record, "stream_slice": stream_slice}
139        for parsed_field in self._parsed_fields:
140            valid_types = (parsed_field.value_type,) if parsed_field.value_type else None
141            value = parsed_field.value.eval(config, valid_types=valid_types, **kwargs)
142            is_empty_condition = not self.condition
143            if is_empty_condition or self._filter_interpolator.eval(
144                config, value=value, path=parsed_field.path, **kwargs
145            ):
146                dpath.new(record, parsed_field.path, value)

Transform a record by adding, deleting, or mutating fields directly from the record reference passed in argument.

Parameters

record: The input record to be transformed
config: The user-provided configuration as specified by the source's spec
stream_state: The stream state
stream_slice: The stream slice

Returns

The transformed record

@dataclass(frozen=True)

class AddedFieldDefinition: View Source

17@dataclass(frozen=True)
18class AddedFieldDefinition:
19    """Defines the field to add on a record"""
20
21    path: FieldPointer
22    value: Union[InterpolatedString, str]
23    value_type: Optional[Type[Any]]
24    parameters: InitVar[Mapping[str, Any]]

Defines the field to add on a record

AddedFieldDefinition( path: List[str], value: Union[InterpolatedString, str], value_type: Optional[Type[Any]], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

path: List[str]

value: Union[InterpolatedString, str]

value_type: Optional[Type[Any]]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

@dataclass

class ApiKeyAuthenticator(airbyte_cdk.DeclarativeAuthenticator): View Source

24@dataclass
25class ApiKeyAuthenticator(DeclarativeAuthenticator):
26    """
27    ApiKeyAuth sets a request header on the HTTP requests sent.
28
29    The header is of the form:
30    `"<header>": "<token>"`
31
32    For example,
33    `ApiKeyAuthenticator("Authorization", "Bearer hello")`
34    will result in the following header set on the HTTP request
35    `"Authorization": "Bearer hello"`
36
37    Attributes:
38        request_option (RequestOption): request option how to inject the token into the request
39        token_provider (TokenProvider): Provider of the token
40        config (Config): The user-provided configuration as specified by the source's spec
41        parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation
42    """
43
44    request_option: RequestOption
45    token_provider: TokenProvider
46    config: Config
47    parameters: InitVar[Mapping[str, Any]]
48
49    @property
50    def auth_header(self) -> str:
51        options = self._get_request_options(RequestOptionType.header)
52        return next(iter(options.keys()), "")
53
54    @property
55    def token(self) -> str:
56        return self.token_provider.get_token()
57
58    def _get_request_options(self, option_type: RequestOptionType) -> Mapping[str, Any]:
59        options: MutableMapping[str, Any] = {}
60        if self.request_option.inject_into == option_type:
61            self.request_option.inject_into_request(options, self.token, self.config)
62        return options
63
64    def get_request_params(self) -> Mapping[str, Any]:
65        return self._get_request_options(RequestOptionType.request_parameter)
66
67    def get_request_body_data(self) -> Union[Mapping[str, Any], str]:
68        return self._get_request_options(RequestOptionType.body_data)
69
70    def get_request_body_json(self) -> Mapping[str, Any]:
71        return self._get_request_options(RequestOptionType.body_json)

ApiKeyAuth sets a request header on the HTTP requests sent.

The header is of the form: "<header>": "<token>"

For example, ApiKeyAuthenticator("Authorization", "Bearer hello") will result in the following header set on the HTTP request "Authorization": "Bearer hello"

Attributes:

request_option (RequestOption): request option how to inject the token into the request
token_provider (TokenProvider): Provider of the token
config (Config): The user-provided configuration as specified by the source's spec
parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation

ApiKeyAuthenticator( request_option: RequestOption, token_provider: airbyte_cdk.sources.declarative.auth.token_provider.TokenProvider, config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

request_option: RequestOption

token_provider: airbyte_cdk.sources.declarative.auth.token_provider.TokenProvider

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

auth_header: str View Source

49    @property
50    def auth_header(self) -> str:
51        options = self._get_request_options(RequestOptionType.header)
52        return next(iter(options.keys()), "")

HTTP header to set on the requests

token: str View Source

54    @property
55    def token(self) -> str:
56        return self.token_provider.get_token()

The header value to set on outgoing HTTP requests

def get_request_params(self) -> Mapping[str, Any]: View Source

64    def get_request_params(self) -> Mapping[str, Any]:
65        return self._get_request_options(RequestOptionType.request_parameter)

HTTP request parameter to add to the requests

def get_request_body_data(self) -> Union[Mapping[str, Any], str]: View Source

67    def get_request_body_data(self) -> Union[Mapping[str, Any], str]:
68        return self._get_request_options(RequestOptionType.body_data)

Form-encoded body data to set on the requests

def get_request_body_json(self) -> Mapping[str, Any]: View Source

70    def get_request_body_json(self) -> Mapping[str, Any]:
71        return self._get_request_options(RequestOptionType.body_json)

JSON-encoded body data to set on the requests

Inherited Members

AbstractHeaderAuthenticator: get_auth_header

class BackoffStrategy(abc.ABC): View Source

12class BackoffStrategy(ABC):
13    @abstractmethod
14    def backoff_time(
15        self,
16        response_or_exception: Optional[Union[requests.Response, requests.RequestException]],
17        attempt_count: int,
18    ) -> Optional[float]:
19        """
20        Override this method to dynamically determine backoff time e.g: by reading the X-Retry-After header.
21
22        This method is called only if should_backoff() returns True for the input request.
23
24        :param response_or_exception: The response or exception that caused the backoff.
25        :param attempt_count: The number of attempts already performed for this request.
26        :return how long to backoff in seconds. The return value may be a floating point number for subsecond precision. Returning None defers backoff
27        to the default backoff behavior (e.g using an exponential algorithm).
28        """
29        pass

Helper class that provides a standard way to create an ABC using inheritance.

@abstractmethod

def backoff_time( self, response_or_exception: Union[requests.models.Response, requests.exceptions.RequestException, NoneType], attempt_count: int) -> Optional[float]: View Source

13    @abstractmethod
14    def backoff_time(
15        self,
16        response_or_exception: Optional[Union[requests.Response, requests.RequestException]],
17        attempt_count: int,
18    ) -> Optional[float]:
19        """
20        Override this method to dynamically determine backoff time e.g: by reading the X-Retry-After header.
21
22        This method is called only if should_backoff() returns True for the input request.
23
24        :param response_or_exception: The response or exception that caused the backoff.
25        :param attempt_count: The number of attempts already performed for this request.
26        :return how long to backoff in seconds. The return value may be a floating point number for subsecond precision. Returning None defers backoff
27        to the default backoff behavior (e.g using an exponential algorithm).
28        """
29        pass

Override this method to dynamically determine backoff time e.g: by reading the X-Retry-After header.

This method is called only if should_backoff() returns True for the input request.

Parameters

response_or_exception: The response or exception that caused the backoff.
attempt_count: The number of attempts already performed for this request. :return how long to backoff in seconds. The return value may be a floating point number for subsecond precision. Returning None defers backoff to the default backoff behavior (e.g using an exponential algorithm).

@dataclass

class BasicHttpAuthenticator(airbyte_cdk.DeclarativeAuthenticator): View Source

101@dataclass
102class BasicHttpAuthenticator(DeclarativeAuthenticator):
103    """
104    Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using base64
105    https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
106
107    The header is of the form
108    `"Authorization": "Basic <encoded_credentials>"`
109
110    Attributes:
111        username (Union[InterpolatedString, str]): The username
112        config (Config): The user-provided configuration as specified by the source's spec
113        password (Union[InterpolatedString, str]): The password
114        parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation
115    """
116
117    username: Union[InterpolatedString, str]
118    config: Config
119    parameters: InitVar[Mapping[str, Any]]
120    password: Union[InterpolatedString, str] = ""
121
122    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
123        self._username = InterpolatedString.create(self.username, parameters=parameters)
124        self._password = InterpolatedString.create(self.password, parameters=parameters)
125
126    @property
127    def auth_header(self) -> str:
128        return "Authorization"
129
130    @property
131    def token(self) -> str:
132        auth_string = (
133            f"{self._username.eval(self.config)}:{self._password.eval(self.config)}".encode("utf8")
134        )
135        b64_encoded = base64.b64encode(auth_string).decode("utf8")
136        return f"Basic {b64_encoded}"

Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using base64 https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme

The header is of the form "Authorization": "Basic <encoded_credentials>"

Attributes:

username (Union[InterpolatedString, str]): The username
config (Config): The user-provided configuration as specified by the source's spec
password (Union[InterpolatedString, str]): The password
parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation

BasicHttpAuthenticator( username: Union[InterpolatedString, str], config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], password: Union[InterpolatedString, str] = '')

username: Union[InterpolatedString, str]

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

password: Union[InterpolatedString, str] = ''

auth_header: str View Source

126    @property
127    def auth_header(self) -> str:
128        return "Authorization"

HTTP header to set on the requests

token: str View Source

130    @property
131    def token(self) -> str:
132        auth_string = (
133            f"{self._username.eval(self.config)}:{self._password.eval(self.config)}".encode("utf8")
134        )
135        b64_encoded = base64.b64encode(auth_string).decode("utf8")
136        return f"Basic {b64_encoded}"

The header value to set on outgoing HTTP requests

Inherited Members

DeclarativeAuthenticator: get_request_params; get_request_body_data; get_request_body_json
AbstractHeaderAuthenticator: get_auth_header

@dataclass

class BearerAuthenticator(airbyte_cdk.DeclarativeAuthenticator): View Source

74@dataclass
75class BearerAuthenticator(DeclarativeAuthenticator):
76    """
77    Authenticator that sets the Authorization header on the HTTP requests sent.
78
79    The header is of the form:
80    `"Authorization": "Bearer <token>"`
81
82    Attributes:
83        token_provider (TokenProvider): Provider of the token
84        config (Config): The user-provided configuration as specified by the source's spec
85        parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation
86    """
87
88    token_provider: TokenProvider
89    config: Config
90    parameters: InitVar[Mapping[str, Any]]
91
92    @property
93    def auth_header(self) -> str:
94        return "Authorization"
95
96    @property
97    def token(self) -> str:
98        return f"Bearer {self.token_provider.get_token()}"

Authenticator that sets the Authorization header on the HTTP requests sent.

The header is of the form: "Authorization": "Bearer <token>"

Attributes:

token_provider (TokenProvider): Provider of the token
config (Config): The user-provided configuration as specified by the source's spec
parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation

BearerAuthenticator( token_provider: airbyte_cdk.sources.declarative.auth.token_provider.TokenProvider, config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

token_provider: airbyte_cdk.sources.declarative.auth.token_provider.TokenProvider

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

auth_header: str View Source

92    @property
93    def auth_header(self) -> str:
94        return "Authorization"

HTTP header to set on the requests

token: str View Source

96    @property
97    def token(self) -> str:
98        return f"Bearer {self.token_provider.get_token()}"

The header value to set on outgoing HTTP requests

Inherited Members

DeclarativeAuthenticator: get_request_params; get_request_body_data; get_request_body_json
AbstractHeaderAuthenticator: get_auth_header

@dataclass

class CartesianProductStreamSlicer(airbyte_cdk.sources.declarative.partition_routers.partition_router.PartitionRouter): View Source

 40@dataclass
 41class CartesianProductStreamSlicer(PartitionRouter):
 42    """
 43    Stream slicers that iterates over the cartesian product of input stream slicers
 44    Given 2 stream slicers with the following slices:
 45    A: [{"i": 0}, {"i": 1}, {"i": 2}]
 46    B: [{"s": "hello"}, {"s": "world"}]
 47    the resulting stream slices are
 48    [
 49        {"i": 0, "s": "hello"},
 50        {"i": 0, "s": "world"},
 51        {"i": 1, "s": "hello"},
 52        {"i": 1, "s": "world"},
 53        {"i": 2, "s": "hello"},
 54        {"i": 2, "s": "world"},
 55    ]
 56
 57    Attributes:
 58        stream_slicers (List[PartitionRouter]): Underlying stream slicers. The RequestOptions (e.g: Request headers, parameters, etc..) returned by this slicer are the combination of the RequestOptions of its input slicers. If there are conflicts e.g: two slicers define the same header or request param, the conflict is resolved by taking the value from the first slicer, where ordering is determined by the order in which slicers were input to this composite slicer.
 59    """
 60
 61    stream_slicers: List[PartitionRouter]
 62    parameters: InitVar[Mapping[str, Any]]
 63
 64    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 65        check_for_substream_in_slicers(self.stream_slicers, self.logger.warning)
 66
 67    def get_request_params(
 68        self,
 69        *,
 70        stream_state: Optional[StreamState] = None,
 71        stream_slice: Optional[StreamSlice] = None,
 72        next_page_token: Optional[Mapping[str, Any]] = None,
 73    ) -> Mapping[str, Any]:
 74        return dict(
 75            ChainMap(
 76                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
 77                    s.get_request_params(
 78                        stream_state=stream_state,
 79                        stream_slice=stream_slice,
 80                        next_page_token=next_page_token,
 81                    )
 82                    for s in self.stream_slicers
 83                ]
 84            )
 85        )
 86
 87    def get_request_headers(
 88        self,
 89        *,
 90        stream_state: Optional[StreamState] = None,
 91        stream_slice: Optional[StreamSlice] = None,
 92        next_page_token: Optional[Mapping[str, Any]] = None,
 93    ) -> Mapping[str, Any]:
 94        return dict(
 95            ChainMap(
 96                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
 97                    s.get_request_headers(
 98                        stream_state=stream_state,
 99                        stream_slice=stream_slice,
100                        next_page_token=next_page_token,
101                    )
102                    for s in self.stream_slicers
103                ]
104            )
105        )
106
107    def get_request_body_data(
108        self,
109        *,
110        stream_state: Optional[StreamState] = None,
111        stream_slice: Optional[StreamSlice] = None,
112        next_page_token: Optional[Mapping[str, Any]] = None,
113    ) -> Mapping[str, Any]:
114        return dict(
115            ChainMap(
116                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
117                    s.get_request_body_data(
118                        stream_state=stream_state,
119                        stream_slice=stream_slice,
120                        next_page_token=next_page_token,
121                    )
122                    for s in self.stream_slicers
123                ]
124            )
125        )
126
127    def get_request_body_json(
128        self,
129        *,
130        stream_state: Optional[StreamState] = None,
131        stream_slice: Optional[StreamSlice] = None,
132        next_page_token: Optional[Mapping[str, Any]] = None,
133    ) -> Mapping[str, Any]:
134        return dict(
135            ChainMap(
136                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
137                    s.get_request_body_json(
138                        stream_state=stream_state,
139                        stream_slice=stream_slice,
140                        next_page_token=next_page_token,
141                    )
142                    for s in self.stream_slicers
143                ]
144            )
145        )
146
147    def stream_slices(self) -> Iterable[StreamSlice]:
148        sub_slices = (s.stream_slices() for s in self.stream_slicers)
149        product = itertools.product(*sub_slices)
150        for stream_slice_tuple in product:
151            partition = dict(ChainMap(*[s.partition for s in stream_slice_tuple]))  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
152            cursor_slices = [s.cursor_slice for s in stream_slice_tuple if s.cursor_slice]
153            if len(cursor_slices) > 1:
154                raise ValueError(
155                    f"There should only be a single cursor slice. Found {cursor_slices}"
156                )
157            if cursor_slices:
158                cursor_slice = cursor_slices[0]
159            else:
160                cursor_slice = {}
161            yield StreamSlice(partition=partition, cursor_slice=cursor_slice)
162
163    def set_initial_state(self, stream_state: StreamState) -> None:
164        """
165        Parent stream states are not supported for cartesian product stream slicer
166        """
167        pass
168
169    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
170        """
171        Parent stream states are not supported for cartesian product stream slicer
172        """
173        pass
174
175    @property
176    def logger(self) -> logging.Logger:
177        return logging.getLogger("airbyte.CartesianProductStreamSlicer")

Stream slicers that iterates over the cartesian product of input stream slicers Given 2 stream slicers with the following slices: A: [{"i": 0}, {"i": 1}, {"i": 2}] B: [{"s": "hello"}, {"s": "world"}] the resulting stream slices are [ {"i": 0, "s": "hello"}, {"i": 0, "s": "world"}, {"i": 1, "s": "hello"}, {"i": 1, "s": "world"}, {"i": 2, "s": "hello"}, {"i": 2, "s": "world"}, ]

Attributes:

stream_slicers (List[PartitionRouter]): Underlying stream slicers. The RequestOptions (e.g: Request headers, parameters, etc..) returned by this slicer are the combination of the RequestOptions of its input slicers. If there are conflicts e.g: two slicers define the same header or request param, the conflict is resolved by taking the value from the first slicer, where ordering is determined by the order in which slicers were input to this composite slicer.

CartesianProductStreamSlicer( stream_slicers: List[airbyte_cdk.sources.declarative.partition_routers.PartitionRouter], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

stream_slicers: List[airbyte_cdk.sources.declarative.partition_routers.PartitionRouter]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

def get_request_params( self, *, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

67    def get_request_params(
68        self,
69        *,
70        stream_state: Optional[StreamState] = None,
71        stream_slice: Optional[StreamSlice] = None,
72        next_page_token: Optional[Mapping[str, Any]] = None,
73    ) -> Mapping[str, Any]:
74        return dict(
75            ChainMap(
76                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
77                    s.get_request_params(
78                        stream_state=stream_state,
79                        stream_slice=stream_slice,
80                        next_page_token=next_page_token,
81                    )
82                    for s in self.stream_slicers
83                ]
84            )
85        )

Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.

E.g: you might want to define query parameters for paging if next_page_token is not None.

def get_request_headers( self, *, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

 87    def get_request_headers(
 88        self,
 89        *,
 90        stream_state: Optional[StreamState] = None,
 91        stream_slice: Optional[StreamSlice] = None,
 92        next_page_token: Optional[Mapping[str, Any]] = None,
 93    ) -> Mapping[str, Any]:
 94        return dict(
 95            ChainMap(
 96                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
 97                    s.get_request_headers(
 98                        stream_state=stream_state,
 99                        stream_slice=stream_slice,
100                        next_page_token=next_page_token,
101                    )
102                    for s in self.stream_slicers
103                ]
104            )
105        )

Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.

def get_request_body_data( self, *, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

107    def get_request_body_data(
108        self,
109        *,
110        stream_state: Optional[StreamState] = None,
111        stream_slice: Optional[StreamSlice] = None,
112        next_page_token: Optional[Mapping[str, Any]] = None,
113    ) -> Mapping[str, Any]:
114        return dict(
115            ChainMap(
116                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
117                    s.get_request_body_data(
118                        stream_state=stream_state,
119                        stream_slice=stream_slice,
120                        next_page_token=next_page_token,
121                    )
122                    for s in self.stream_slicers
123                ]
124            )
125        )

Specifies how to populate the body of the request with a non-JSON payload.

If returns a ready text that it will be sent as is. If returns a dict that it will be converted to a urlencoded form. E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

def get_request_body_json( self, *, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

127    def get_request_body_json(
128        self,
129        *,
130        stream_state: Optional[StreamState] = None,
131        stream_slice: Optional[StreamSlice] = None,
132        next_page_token: Optional[Mapping[str, Any]] = None,
133    ) -> Mapping[str, Any]:
134        return dict(
135            ChainMap(
136                *[  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
137                    s.get_request_body_json(
138                        stream_state=stream_state,
139                        stream_slice=stream_slice,
140                        next_page_token=next_page_token,
141                    )
142                    for s in self.stream_slicers
143                ]
144            )
145        )

Specifies how to populate the body of the request with a JSON payload.

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

def stream_slices(self) -> Iterable[StreamSlice]: View Source

147    def stream_slices(self) -> Iterable[StreamSlice]:
148        sub_slices = (s.stream_slices() for s in self.stream_slicers)
149        product = itertools.product(*sub_slices)
150        for stream_slice_tuple in product:
151            partition = dict(ChainMap(*[s.partition for s in stream_slice_tuple]))  # type: ignore # ChainMap expects a MutableMapping[Never, Never] for reasons
152            cursor_slices = [s.cursor_slice for s in stream_slice_tuple if s.cursor_slice]
153            if len(cursor_slices) > 1:
154                raise ValueError(
155                    f"There should only be a single cursor slice. Found {cursor_slices}"
156                )
157            if cursor_slices:
158                cursor_slice = cursor_slices[0]
159            else:
160                cursor_slice = {}
161            yield StreamSlice(partition=partition, cursor_slice=cursor_slice)

Defines stream slices

Returns

An iterable of stream slices

def set_initial_state(self, stream_state: Mapping[str, Any]) -> None: View Source

163    def set_initial_state(self, stream_state: StreamState) -> None:
164        """
165        Parent stream states are not supported for cartesian product stream slicer
166        """
167        pass

Parent stream states are not supported for cartesian product stream slicer

def get_stream_state(self) -> Optional[Mapping[str, Mapping[str, Any]]]: View Source

169    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
170        """
171        Parent stream states are not supported for cartesian product stream slicer
172        """
173        pass

Parent stream states are not supported for cartesian product stream slicer

logger: logging.Logger View Source

175    @property
176    def logger(self) -> logging.Logger:
177        return logging.getLogger("airbyte.CartesianProductStreamSlicer")

@dataclass

class CursorPaginationStrategy(airbyte_cdk.PaginationStrategy): View Source

24@dataclass
25class CursorPaginationStrategy(PaginationStrategy):
26    """
27    Pagination strategy that evaluates an interpolated string to define the next page token
28
29    Attributes:
30        page_size (Optional[int]): the number of records to request
31        cursor_value (Union[InterpolatedString, str]): template string evaluating to the cursor value
32        config (Config): connection config
33        stop_condition (Optional[InterpolatedBoolean]): template string evaluating when to stop paginating
34        decoder (Decoder): decoder to decode the response
35    """
36
37    cursor_value: Union[InterpolatedString, str]
38    config: Config
39    parameters: InitVar[Mapping[str, Any]]
40    page_size: Optional[int] = None
41    stop_condition: Optional[Union[InterpolatedBoolean, str]] = None
42    decoder: Decoder = field(
43        default_factory=lambda: PaginationDecoderDecorator(decoder=JsonDecoder(parameters={}))
44    )
45
46    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
47        if isinstance(self.cursor_value, str):
48            self._cursor_value = InterpolatedString.create(self.cursor_value, parameters=parameters)
49        else:
50            self._cursor_value = self.cursor_value
51        if isinstance(self.stop_condition, str):
52            self._stop_condition: Optional[InterpolatedBoolean] = InterpolatedBoolean(
53                condition=self.stop_condition, parameters=parameters
54            )
55        else:
56            self._stop_condition = self.stop_condition
57
58    @property
59    def initial_token(self) -> Optional[Any]:
60        """
61        CursorPaginationStrategy does not have an initial value because the next cursor is typically included
62        in the response of the first request. For Resumable Full Refresh streams that checkpoint the page
63        cursor, the next cursor should be read from the state or stream slice object.
64        """
65        return None
66
67    def next_page_token(
68        self,
69        response: requests.Response,
70        last_page_size: int,
71        last_record: Optional[Record],
72        last_page_token_value: Optional[Any] = None,
73    ) -> Optional[Any]:
74        decoded_response = next(self.decoder.decode(response))
75        # The default way that link is presented in requests.Response is a string of various links (last, next, etc). This
76        # is not indexable or useful for parsing the cursor, so we replace it with the link dictionary from response.links
77        headers: Dict[str, Any] = dict(response.headers)
78        headers["link"] = response.links
79        if self._stop_condition:
80            should_stop = self._stop_condition.eval(
81                self.config,
82                response=decoded_response,
83                headers=headers,
84                last_record=last_record,
85                last_page_size=last_page_size,
86            )
87            if should_stop:
88                return None
89        token = self._cursor_value.eval(
90            config=self.config,
91            response=decoded_response,
92            headers=headers,
93            last_record=last_record,
94            last_page_size=last_page_size,
95        )
96        return token if token else None
97
98    def get_page_size(self) -> Optional[int]:
99        return self.page_size

Pagination strategy that evaluates an interpolated string to define the next page token

Attributes:

page_size (Optional[int]): the number of records to request
cursor_value (Union[InterpolatedString, str]): template string evaluating to the cursor value
config (Config): connection config
stop_condition (Optional[InterpolatedBoolean]): template string evaluating when to stop paginating
decoder (Decoder): decoder to decode the response

CursorPaginationStrategy( cursor_value: Union[InterpolatedString, str], config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], page_size: Optional[int] = None, stop_condition: Union[InterpolatedBoolean, str, NoneType] = None, decoder: Decoder = <factory>)

cursor_value: Union[InterpolatedString, str]

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

page_size: Optional[int] = None

stop_condition: Union[InterpolatedBoolean, str, NoneType] = None

decoder: Decoder

initial_token: Optional[Any] View Source

58    @property
59    def initial_token(self) -> Optional[Any]:
60        """
61        CursorPaginationStrategy does not have an initial value because the next cursor is typically included
62        in the response of the first request. For Resumable Full Refresh streams that checkpoint the page
63        cursor, the next cursor should be read from the state or stream slice object.
64        """
65        return None

CursorPaginationStrategy does not have an initial value because the next cursor is typically included in the response of the first request. For Resumable Full Refresh streams that checkpoint the page cursor, the next cursor should be read from the state or stream slice object.

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any] = None) -> Optional[Any]: View Source

67    def next_page_token(
68        self,
69        response: requests.Response,
70        last_page_size: int,
71        last_record: Optional[Record],
72        last_page_token_value: Optional[Any] = None,
73    ) -> Optional[Any]:
74        decoded_response = next(self.decoder.decode(response))
75        # The default way that link is presented in requests.Response is a string of various links (last, next, etc). This
76        # is not indexable or useful for parsing the cursor, so we replace it with the link dictionary from response.links
77        headers: Dict[str, Any] = dict(response.headers)
78        headers["link"] = response.links
79        if self._stop_condition:
80            should_stop = self._stop_condition.eval(
81                self.config,
82                response=decoded_response,
83                headers=headers,
84                last_record=last_record,
85                last_page_size=last_page_size,
86            )
87            if should_stop:
88                return None
89        token = self._cursor_value.eval(
90            config=self.config,
91            response=decoded_response,
92            headers=headers,
93            last_record=last_record,
94            last_page_size=last_page_size,
95        )
96        return token if token else None

Parameters

response: response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

next page token. Returns None if there are no more pages to fetch

def get_page_size(self) -> Optional[int]: View Source

98    def get_page_size(self) -> Optional[int]:
99        return self.page_size

Returns

page size: The number of records to fetch in a page. Returns None if unspecified

@dataclass

class DeclarativeAuthenticator(airbyte_cdk.AbstractHeaderAuthenticator): View Source

14@dataclass
15class DeclarativeAuthenticator(AbstractHeaderAuthenticator):
16    """
17    Interface used to associate which authenticators can be used as part of the declarative framework
18    """
19
20    def get_request_params(self) -> Mapping[str, Any]:
21        """HTTP request parameter to add to the requests"""
22        return {}
23
24    def get_request_body_data(self) -> Union[Mapping[str, Any], str]:
25        """Form-encoded body data to set on the requests"""
26        return {}
27
28    def get_request_body_json(self) -> Mapping[str, Any]:
29        """JSON-encoded body data to set on the requests"""
30        return {}

Interface used to associate which authenticators can be used as part of the declarative framework

def get_request_params(self) -> Mapping[str, Any]: View Source

20    def get_request_params(self) -> Mapping[str, Any]:
21        """HTTP request parameter to add to the requests"""
22        return {}

HTTP request parameter to add to the requests

def get_request_body_data(self) -> Union[Mapping[str, Any], str]: View Source

24    def get_request_body_data(self) -> Union[Mapping[str, Any], str]:
25        """Form-encoded body data to set on the requests"""
26        return {}

Form-encoded body data to set on the requests

def get_request_body_json(self) -> Mapping[str, Any]: View Source

28    def get_request_body_json(self) -> Mapping[str, Any]:
29        """JSON-encoded body data to set on the requests"""
30        return {}

JSON-encoded body data to set on the requests

Inherited Members

AbstractHeaderAuthenticator: get_auth_header; auth_header; token

@dataclass

class DeclarativeSingleUseRefreshTokenOauth2Authenticator(airbyte_cdk.SingleUseRefreshTokenOauth2Authenticator, airbyte_cdk.DeclarativeAuthenticator): View Source

291@dataclass
292class DeclarativeSingleUseRefreshTokenOauth2Authenticator(
293    SingleUseRefreshTokenOauth2Authenticator, DeclarativeAuthenticator
294):
295    """
296    Declarative version of SingleUseRefreshTokenOauth2Authenticator which can be used in declarative connectors.
297    """
298
299    def __init__(self, *args: Any, **kwargs: Any) -> None:
300        super().__init__(*args, **kwargs)

Declarative version of SingleUseRefreshTokenOauth2Authenticator which can be used in declarative connectors.

DeclarativeSingleUseRefreshTokenOauth2Authenticator(*args: Any, **kwargs: Any) View Source

299    def __init__(self, *args: Any, **kwargs: Any) -> None:
300        super().__init__(*args, **kwargs)

Arguments:

connector_config (Mapping[str, Any]): The full connector configuration
token_refresh_endpoint (str): Full URL to the token refresh endpoint
scopes (List[str], optional): List of OAuth scopes to pass in the refresh token request body. Defaults to None.
access_token_name (str, optional): Name of the access token field, used to parse the refresh token response. Defaults to "access_token".
expires_in_name (str, optional): Name of the name of the field that characterizes when the current access token will expire, used to parse the refresh token response. Defaults to "expires_in".
refresh_token_name (str, optional): Name of the name of the refresh token field, used to parse the refresh token response. Defaults to "refresh_token".
refresh_request_body (Mapping[str, Any], optional): Custom key value pair that will be added to the refresh token request body. Defaults to None.
refresh_request_headers (Mapping[str, Any], optional): Custom key value pair that will be added to the refresh token request headers. Defaults to None.
grant_type (str, optional): OAuth grant type. Defaults to "refresh_token".
client_id (Optional[str]): The client id to authenticate. If not specified, defaults to credentials.client_id in the config object.
client_secret (Optional[str]): The client secret to authenticate. If not specified, defaults to credentials.client_secret in the config object.
access_token_config_path (Sequence[str]): Dpath to the access_token field in the connector configuration. Defaults to ("credentials", "access_token").
refresh_token_config_path (Sequence[str]): Dpath to the refresh_token field in the connector configuration. Defaults to ("credentials", "refresh_token").
token_expiry_date_config_path (Sequence[str]): Dpath to the token_expiry_date field in the connector configuration. Defaults to ("credentials", "token_expiry_date").
token_expiry_date_format (Optional[str]): Date format of the token expiry date field (set by expires_in_name). If not specified the token expiry date is interpreted as number of seconds until expiration.
token_expiry_is_time_of_expiration bool: set True it if expires_in is returned as time of expiration instead of the number seconds until expiration
message_repository (MessageRepository): the message repository used to emit logs on HTTP requests and control message on config update

Inherited Members

SingleUseRefreshTokenOauth2Authenticator: access_token; get_refresh_token; set_refresh_token; get_token_expiry_date; set_token_expiry_date; token_has_expired; get_access_token; refresh_access_token
Oauth2Authenticator: get_token_refresh_endpoint; get_client_id_name; get_client_id; get_client_secret_name; get_client_secret; get_refresh_token_name; get_access_token_name; get_scopes; get_expires_in_name; get_refresh_request_body; get_refresh_request_headers; get_grant_type_name; get_grant_type; token_expiry_is_time_of_expiration; token_expiry_date_format
airbyte_cdk.sources.streams.http.requests_native_auth.abstract_oauth.AbstractOauth2Authenticator: get_auth_header; build_refresh_request_body; build_refresh_request_headers
DeclarativeAuthenticator: get_request_params; get_request_body_data; get_request_body_json
AbstractHeaderAuthenticator: auth_header; token

@dataclass

class Decoder: View Source

15@dataclass
16class Decoder:
17    """
18    Decoder strategy to transform a requests.Response into a Mapping[str, Any]
19    """
20
21    @abstractmethod
22    def is_stream_response(self) -> bool:
23        """
24        Set to True if you'd like to use stream=True option in http requester
25        """
26
27    @abstractmethod
28    def decode(self, response: requests.Response) -> DECODER_OUTPUT_TYPE:
29        """
30        Decodes a requests.Response into a Mapping[str, Any] or an array
31        :param response: the response to decode
32        :return: Generator of Mapping describing the response
33        """

Decoder strategy to transform a requests.Response into a Mapping[str, Any]

@abstractmethod

def is_stream_response(self) -> bool: View Source

21    @abstractmethod
22    def is_stream_response(self) -> bool:
23        """
24        Set to True if you'd like to use stream=True option in http requester
25        """

Set to True if you'd like to use stream=True option in http requester

@abstractmethod

def decode( self, response: requests.models.Response) -> Generator[MutableMapping[str, Any], NoneType, NoneType]: View Source

27    @abstractmethod
28    def decode(self, response: requests.Response) -> DECODER_OUTPUT_TYPE:
29        """
30        Decodes a requests.Response into a Mapping[str, Any] or an array
31        :param response: the response to decode
32        :return: Generator of Mapping describing the response
33        """

Decodes a requests.Response into a Mapping[str, Any] or an array

Parameters

response: the response to decode

Returns

Generator of Mapping describing the response

@dataclass

class DefaultPaginator(airbyte_cdk.sources.declarative.requesters.paginators.paginator.Paginator): View Source

 33@dataclass
 34class DefaultPaginator(Paginator):
 35    """
 36    Default paginator to request pages of results with a fixed size until the pagination strategy no longer returns a next_page_token
 37
 38    Examples:
 39        1.
 40        * fetches up to 10 records at a time by setting the "limit" request param to 10
 41        * updates the request path with  "{{ response._metadata.next }}"
 42        ```
 43          paginator:
 44            type: "DefaultPaginator"
 45            page_size_option:
 46              type: RequestOption
 47              inject_into: request_parameter
 48              field_name: limit
 49            page_token_option:
 50              type: RequestPath
 51              path: "location"
 52            pagination_strategy:
 53              type: "CursorPagination"
 54              cursor_value: "{{ response._metadata.next }}"
 55              page_size: 10
 56        ```
 57
 58        2.
 59        * fetches up to 5 records at a time by setting the "page_size" header to 5
 60        * increments a record counter and set the request parameter "offset" to the value of the counter
 61        ```
 62          paginator:
 63            type: "DefaultPaginator"
 64            page_size_option:
 65              type: RequestOption
 66              inject_into: header
 67              field_name: page_size
 68            pagination_strategy:
 69              type: "OffsetIncrement"
 70              page_size: 5
 71            page_token_option:
 72              option_type: "request_parameter"
 73              field_name: "offset"
 74        ```
 75
 76        3.
 77        * fetches up to 5 records at a time by setting the "page_size" request param to 5
 78        * increments a page counter and set the request parameter "page" to the value of the counter
 79        ```
 80          paginator:
 81            type: "DefaultPaginator"
 82            page_size_option:
 83              type: RequestOption
 84              inject_into: request_parameter
 85              field_name: page_size
 86            pagination_strategy:
 87              type: "PageIncrement"
 88              page_size: 5
 89            page_token_option:
 90              type: RequestOption
 91              option_type: "request_parameter"
 92              field_name: "page"
 93        ```
 94    Attributes:
 95        page_size_option (Optional[RequestOption]): the request option to set the page size. Cannot be injected in the path.
 96        page_token_option (Optional[RequestPath, RequestOption]): the request option to set the page token
 97        pagination_strategy (PaginationStrategy): Strategy defining how to get the next page token
 98        config (Config): connection config
 99        url_base (Union[InterpolatedString, str]): endpoint's base url
100        decoder (Decoder): decoder to decode the response
101    """
102
103    pagination_strategy: PaginationStrategy
104    config: Config
105    url_base: Union[InterpolatedString, str]
106    parameters: InitVar[Mapping[str, Any]]
107    decoder: Decoder = field(
108        default_factory=lambda: PaginationDecoderDecorator(decoder=JsonDecoder(parameters={}))
109    )
110    page_size_option: Optional[RequestOption] = None
111    page_token_option: Optional[Union[RequestPath, RequestOption]] = None
112
113    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
114        if self.page_size_option and not self.pagination_strategy.get_page_size():
115            raise ValueError(
116                "page_size_option cannot be set if the pagination strategy does not have a page_size"
117            )
118        if isinstance(self.url_base, str):
119            self.url_base = InterpolatedString(string=self.url_base, parameters=parameters)
120
121        if self.page_token_option and not isinstance(self.page_token_option, RequestPath):
122            _validate_component_request_option_paths(
123                self.config,
124                self.page_size_option,
125                self.page_token_option,
126            )
127
128    def get_initial_token(self) -> Optional[Any]:
129        """
130        Return the page token that should be used for the first request of a stream
131
132        WARNING: get_initial_token() should not be used by streams that use RFR that perform checkpointing
133        of state using page numbers. Because paginators are stateless
134        """
135        return self.pagination_strategy.initial_token
136
137    def next_page_token(
138        self,
139        response: requests.Response,
140        last_page_size: int,
141        last_record: Optional[Record],
142        last_page_token_value: Optional[Any] = None,
143    ) -> Optional[Mapping[str, Any]]:
144        next_page_token = self.pagination_strategy.next_page_token(
145            response=response,
146            last_page_size=last_page_size,
147            last_record=last_record,
148            last_page_token_value=last_page_token_value,
149        )
150        if next_page_token:
151            return {"next_page_token": next_page_token}
152        else:
153            return None
154
155    def path(
156        self,
157        next_page_token: Optional[Mapping[str, Any]],
158        stream_state: Optional[Mapping[str, Any]] = None,
159        stream_slice: Optional[StreamSlice] = None,
160    ) -> Optional[str]:
161        token = next_page_token.get("next_page_token") if next_page_token else None
162        if token and self.page_token_option and isinstance(self.page_token_option, RequestPath):
163            return str(token)
164        else:
165            return None
166
167    def get_request_params(
168        self,
169        *,
170        stream_state: Optional[StreamState] = None,
171        stream_slice: Optional[StreamSlice] = None,
172        next_page_token: Optional[Mapping[str, Any]] = None,
173    ) -> MutableMapping[str, Any]:
174        return self._get_request_options(RequestOptionType.request_parameter, next_page_token)
175
176    def get_request_headers(
177        self,
178        *,
179        stream_state: Optional[StreamState] = None,
180        stream_slice: Optional[StreamSlice] = None,
181        next_page_token: Optional[Mapping[str, Any]] = None,
182    ) -> Mapping[str, str]:
183        return self._get_request_options(RequestOptionType.header, next_page_token)
184
185    def get_request_body_data(
186        self,
187        *,
188        stream_state: Optional[StreamState] = None,
189        stream_slice: Optional[StreamSlice] = None,
190        next_page_token: Optional[Mapping[str, Any]] = None,
191    ) -> Mapping[str, Any]:
192        return self._get_request_options(RequestOptionType.body_data, next_page_token)
193
194    def get_request_body_json(
195        self,
196        *,
197        stream_state: Optional[StreamState] = None,
198        stream_slice: Optional[StreamSlice] = None,
199        next_page_token: Optional[Mapping[str, Any]] = None,
200    ) -> Mapping[str, Any]:
201        return self._get_request_options(RequestOptionType.body_json, next_page_token)
202
203    def _get_request_options(
204        self, option_type: RequestOptionType, next_page_token: Optional[Mapping[str, Any]]
205    ) -> MutableMapping[str, Any]:
206        options: MutableMapping[str, Any] = {}
207
208        token = next_page_token.get("next_page_token") if next_page_token else None
209        if (
210            self.page_token_option
211            and token is not None
212            and isinstance(self.page_token_option, RequestOption)
213            and self.page_token_option.inject_into == option_type
214        ):
215            self.page_token_option.inject_into_request(options, token, self.config)
216
217        if (
218            self.page_size_option
219            and self.pagination_strategy.get_page_size()
220            and self.page_size_option.inject_into == option_type
221        ):
222            page_size = self.pagination_strategy.get_page_size()
223            self.page_size_option.inject_into_request(options, page_size, self.config)
224
225        return options

Default paginator to request pages of results with a fixed size until the pagination strategy no longer returns a next_page_token

Examples:

1.

fetches up to 10 records at a time by setting the "limit" request param to 10

updates the request path with "{{ response._metadata.next }}"
  paginator:
    type: "DefaultPaginator"
    page_size_option:
      type: RequestOption
      inject_into: request_parameter
      field_name: limit
    page_token_option:
      type: RequestPath
      path: "location"
    pagination_strategy:
      type: "CursorPagination"
      cursor_value: "{{ response._metadata.next }}"
      page_size: 10
2.

fetches up to 5 records at a time by setting the "page_size" header to 5

increments a record counter and set the request parameter "offset" to the value of the counter
  paginator:
    type: "DefaultPaginator"
    page_size_option:
      type: RequestOption
      inject_into: header
      field_name: page_size
    pagination_strategy:
      type: "OffsetIncrement"
      page_size: 5
    page_token_option:
      option_type: "request_parameter"
      field_name: "offset"
3.

fetches up to 5 records at a time by setting the "page_size" request param to 5

increments a page counter and set the request parameter "page" to the value of the counter
  paginator:
    type: "DefaultPaginator"
    page_size_option:
      type: RequestOption
      inject_into: request_parameter
      field_name: page_size
    pagination_strategy:
      type: "PageIncrement"
      page_size: 5
    page_token_option:
      type: RequestOption
      option_type: "request_parameter"
      field_name: "page"

Attributes:

page_size_option (Optional[RequestOption]): the request option to set the page size. Cannot be injected in the path.
page_token_option (Optional[RequestPath, RequestOption]): the request option to set the page token
pagination_strategy (PaginationStrategy): Strategy defining how to get the next page token
config (Config): connection config
url_base (Union[InterpolatedString, str]): endpoint's base url
decoder (Decoder): decoder to decode the response

DefaultPaginator( pagination_strategy: PaginationStrategy, config: Mapping[str, Any], url_base: Union[InterpolatedString, str], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], decoder: Decoder = <factory>, page_size_option: Optional[RequestOption] = None, page_token_option: Union[airbyte_cdk.sources.declarative.requesters.request_path.RequestPath, RequestOption, NoneType] = None)

pagination_strategy: PaginationStrategy

config: Mapping[str, Any]

url_base: Union[InterpolatedString, str]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

decoder: Decoder

page_size_option: Optional[RequestOption] = None

page_token_option: Union[airbyte_cdk.sources.declarative.requesters.request_path.RequestPath, RequestOption, NoneType] = None

def get_initial_token(self) -> Optional[Any]: View Source

128    def get_initial_token(self) -> Optional[Any]:
129        """
130        Return the page token that should be used for the first request of a stream
131
132        WARNING: get_initial_token() should not be used by streams that use RFR that perform checkpointing
133        of state using page numbers. Because paginators are stateless
134        """
135        return self.pagination_strategy.initial_token

Return the page token that should be used for the first request of a stream

WARNING: get_initial_token() should not be used by streams that use RFR that perform checkpointing of state using page numbers. Because paginators are stateless

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any] = None) -> Optional[Mapping[str, Any]]: View Source

137    def next_page_token(
138        self,
139        response: requests.Response,
140        last_page_size: int,
141        last_record: Optional[Record],
142        last_page_token_value: Optional[Any] = None,
143    ) -> Optional[Mapping[str, Any]]:
144        next_page_token = self.pagination_strategy.next_page_token(
145            response=response,
146            last_page_size=last_page_size,
147            last_record=last_record,
148            last_page_token_value=last_page_token_value,
149        )
150        if next_page_token:
151            return {"next_page_token": next_page_token}
152        else:
153            return None

Returns the next_page_token to use to fetch the next page of records.

Parameters

response: the response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

A mapping {"next_page_token": } for the next page from the input response object. Returning None means there are no more pages to read in this response.

def path( self, next_page_token: Optional[Mapping[str, Any]], stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None) -> Optional[str]: View Source

155    def path(
156        self,
157        next_page_token: Optional[Mapping[str, Any]],
158        stream_state: Optional[Mapping[str, Any]] = None,
159        stream_slice: Optional[StreamSlice] = None,
160    ) -> Optional[str]:
161        token = next_page_token.get("next_page_token") if next_page_token else None
162        if token and self.page_token_option and isinstance(self.page_token_option, RequestPath):
163            return str(token)
164        else:
165            return None

Returns the URL path to hit to fetch the next page of records

e.g: if you wanted to hit https://myapi.com/v1/some_entity then this will return "some_entity"

Returns

path to hit to fetch the next request. Returning None means the path is not defined by the next_page_token

167    def get_request_params(
168        self,
169        *,
170        stream_state: Optional[StreamState] = None,
171        stream_slice: Optional[StreamSlice] = None,
172        next_page_token: Optional[Mapping[str, Any]] = None,
173    ) -> MutableMapping[str, Any]:
174        return self._get_request_options(RequestOptionType.request_parameter, next_page_token)

Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.

E.g: you might want to define query parameters for paging if next_page_token is not None.

176    def get_request_headers(
177        self,
178        *,
179        stream_state: Optional[StreamState] = None,
180        stream_slice: Optional[StreamSlice] = None,
181        next_page_token: Optional[Mapping[str, Any]] = None,
182    ) -> Mapping[str, str]:
183        return self._get_request_options(RequestOptionType.header, next_page_token)

Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.

185    def get_request_body_data(
186        self,
187        *,
188        stream_state: Optional[StreamState] = None,
189        stream_slice: Optional[StreamSlice] = None,
190        next_page_token: Optional[Mapping[str, Any]] = None,
191    ) -> Mapping[str, Any]:
192        return self._get_request_options(RequestOptionType.body_data, next_page_token)

Specifies how to populate the body of the request with a non-JSON payload.

If returns a ready text that it will be sent as is. If returns a dict that it will be converted to a urlencoded form. E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

194    def get_request_body_json(
195        self,
196        *,
197        stream_state: Optional[StreamState] = None,
198        stream_slice: Optional[StreamSlice] = None,
199        next_page_token: Optional[Mapping[str, Any]] = None,
200    ) -> Mapping[str, Any]:
201        return self._get_request_options(RequestOptionType.body_json, next_page_token)

Specifies how to populate the body of the request with a JSON payload.

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

@dataclass

class DefaultRequestOptionsProvider(airbyte_cdk.sources.declarative.requesters.request_options.request_options_provider.RequestOptionsProvider): View Source

15@dataclass
16class DefaultRequestOptionsProvider(RequestOptionsProvider):
17    """
18    Request options provider that extracts fields from the stream_slice and injects them into the respective location in the
19    outbound request being made
20    """
21
22    parameters: InitVar[Mapping[str, Any]]
23
24    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
25        pass
26
27    def get_request_params(
28        self,
29        *,
30        stream_state: Optional[StreamState] = None,
31        stream_slice: Optional[StreamSlice] = None,
32        next_page_token: Optional[Mapping[str, Any]] = None,
33    ) -> Mapping[str, Any]:
34        return {}
35
36    def get_request_headers(
37        self,
38        *,
39        stream_state: Optional[StreamState] = None,
40        stream_slice: Optional[StreamSlice] = None,
41        next_page_token: Optional[Mapping[str, Any]] = None,
42    ) -> Mapping[str, Any]:
43        return {}
44
45    def get_request_body_data(
46        self,
47        *,
48        stream_state: Optional[StreamState] = None,
49        stream_slice: Optional[StreamSlice] = None,
50        next_page_token: Optional[Mapping[str, Any]] = None,
51    ) -> Union[Mapping[str, Any], str]:
52        return {}
53
54    def get_request_body_json(
55        self,
56        *,
57        stream_state: Optional[StreamState] = None,
58        stream_slice: Optional[StreamSlice] = None,
59        next_page_token: Optional[Mapping[str, Any]] = None,
60    ) -> Mapping[str, Any]:
61        return {}

Request options provider that extracts fields from the stream_slice and injects them into the respective location in the outbound request being made

DefaultRequestOptionsProvider(parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

27    def get_request_params(
28        self,
29        *,
30        stream_state: Optional[StreamState] = None,
31        stream_slice: Optional[StreamSlice] = None,
32        next_page_token: Optional[Mapping[str, Any]] = None,
33    ) -> Mapping[str, Any]:
34        return {}

Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.

E.g: you might want to define query parameters for paging if next_page_token is not None.

36    def get_request_headers(
37        self,
38        *,
39        stream_state: Optional[StreamState] = None,
40        stream_slice: Optional[StreamSlice] = None,
41        next_page_token: Optional[Mapping[str, Any]] = None,
42    ) -> Mapping[str, Any]:
43        return {}

Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.

45    def get_request_body_data(
46        self,
47        *,
48        stream_state: Optional[StreamState] = None,
49        stream_slice: Optional[StreamSlice] = None,
50        next_page_token: Optional[Mapping[str, Any]] = None,
51    ) -> Union[Mapping[str, Any], str]:
52        return {}

Specifies how to populate the body of the request with a non-JSON payload.

If returns a ready text that it will be sent as is. If returns a dict that it will be converted to a urlencoded form. E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

54    def get_request_body_json(
55        self,
56        *,
57        stream_state: Optional[StreamState] = None,
58        stream_slice: Optional[StreamSlice] = None,
59        next_page_token: Optional[Mapping[str, Any]] = None,
60    ) -> Mapping[str, Any]:
61        return {}

Specifies how to populate the body of the request with a JSON payload.

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

@dataclass

class DpathExtractor(airbyte_cdk.RecordExtractor): View Source

18@dataclass
19class DpathExtractor(RecordExtractor):
20    """
21    Record extractor that searches a decoded response over a path defined as an array of fields.
22
23    If the field path points to an array, that array is returned.
24    If the field path points to an object, that object is returned wrapped as an array.
25    If the field path points to an empty object, an empty array is returned.
26    If the field path points to a non-existing path, an empty array is returned.
27
28    Examples of instantiating this transform:
29    ```
30      extractor:
31        type: DpathExtractor
32        field_path:
33          - "root"
34          - "data"
35    ```
36
37    ```
38      extractor:
39        type: DpathExtractor
40        field_path:
41          - "root"
42          - "{{ parameters['field'] }}"
43    ```
44
45    ```
46      extractor:
47        type: DpathExtractor
48        field_path: []
49    ```
50
51    Attributes:
52        field_path (Union[InterpolatedString, str]): Path to the field that should be extracted
53        config (Config): The user-provided configuration as specified by the source's spec
54        decoder (Decoder): The decoder responsible to transfom the response in a Mapping
55    """
56
57    field_path: List[Union[InterpolatedString, str]]
58    config: Config
59    parameters: InitVar[Mapping[str, Any]]
60    decoder: Decoder = field(default_factory=lambda: JsonDecoder(parameters={}))
61
62    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
63        self._field_path = [
64            InterpolatedString.create(path, parameters=parameters) for path in self.field_path
65        ]
66        for path_index in range(len(self.field_path)):
67            if isinstance(self.field_path[path_index], str):
68                self._field_path[path_index] = InterpolatedString.create(
69                    self.field_path[path_index], parameters=parameters
70                )
71
72    def extract_records(self, response: requests.Response) -> Iterable[MutableMapping[Any, Any]]:
73        for body in self.decoder.decode(response):
74            if len(self._field_path) == 0:
75                extracted = body
76            else:
77                path = [path.eval(self.config) for path in self._field_path]
78                if "*" in path:
79                    extracted = dpath.values(body, path)
80                else:
81                    extracted = dpath.get(body, path, default=[])  # type: ignore # extracted will be a MutableMapping, given input data structure
82            if isinstance(extracted, list):
83                yield from extracted
84            elif extracted:
85                yield extracted
86            else:
87                yield from []

Record extractor that searches a decoded response over a path defined as an array of fields.

If the field path points to an array, that array is returned. If the field path points to an object, that object is returned wrapped as an array. If the field path points to an empty object, an empty array is returned. If the field path points to a non-existing path, an empty array is returned.

Examples of instantiating this transform:

  extractor:
    type: DpathExtractor
    field_path:
      - "root"
      - "data"

  extractor:
    type: DpathExtractor
    field_path:
      - "root"
      - "{{ parameters['field'] }}"

  extractor:
    type: DpathExtractor
    field_path: []

Attributes:

field_path (Union[InterpolatedString, str]): Path to the field that should be extracted
config (Config): The user-provided configuration as specified by the source's spec
decoder (Decoder): The decoder responsible to transfom the response in a Mapping

DpathExtractor( field_path: List[Union[InterpolatedString, str]], config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], decoder: Decoder = <factory>)

field_path: List[Union[InterpolatedString, str]]

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

decoder: Decoder

def extract_records( self, response: requests.models.Response) -> Iterable[MutableMapping[Any, Any]]: View Source

72    def extract_records(self, response: requests.Response) -> Iterable[MutableMapping[Any, Any]]:
73        for body in self.decoder.decode(response):
74            if len(self._field_path) == 0:
75                extracted = body
76            else:
77                path = [path.eval(self.config) for path in self._field_path]
78                if "*" in path:
79                    extracted = dpath.values(body, path)
80                else:
81                    extracted = dpath.get(body, path, default=[])  # type: ignore # extracted will be a MutableMapping, given input data structure
82            if isinstance(extracted, list):
83                yield from extracted
84            elif extracted:
85                yield extracted
86            else:
87                yield from []

Selects records from the response

Parameters

response: The response to extract the records from

Returns

List of Records extracted from the response

FieldPointer = typing.List[str]

class HttpMethod(enum.Enum): View Source

19class HttpMethod(Enum):
20    """
21    Http Method to use when submitting an outgoing HTTP request
22    """
23
24    DELETE = "DELETE"
25    GET = "GET"
26    PATCH = "PATCH"
27    POST = "POST"

Http Method to use when submitting an outgoing HTTP request

DELETE = <HttpMethod.DELETE: 'DELETE'>

GET = <HttpMethod.GET: 'GET'>

PATCH = <HttpMethod.PATCH: 'PATCH'>

POST = <HttpMethod.POST: 'POST'>

@dataclass

class InterpolatedBoolean: View Source

29@dataclass
30class InterpolatedBoolean:
31    f"""
32    Wrapper around a string to be evaluated to a boolean value.
33    The string will be evaluated as False if it interpolates to a value in {FALSE_VALUES}
34
35    Attributes:
36        condition (str): The string representing the condition to evaluate to a boolean
37    """
38    condition: str
39    parameters: InitVar[Mapping[str, Any]]
40
41    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
42        self._default = "False"
43        self._interpolation = JinjaInterpolation()
44        self._parameters = parameters
45
46    def eval(self, config: Config, **additional_parameters: Any) -> bool:
47        """
48        Interpolates the predicate condition string using the config and other optional arguments passed as parameter.
49
50        :param config: The user-provided configuration as specified by the source's spec
51        :param additional_parameters: Optional parameters used for interpolation
52        :return: The interpolated string
53        """
54        if isinstance(self.condition, bool):
55            return self.condition
56        else:
57            evaluated = self._interpolation.eval(
58                self.condition,
59                config,
60                self._default,
61                parameters=self._parameters,
62                **additional_parameters,
63            )
64            if evaluated in FALSE_VALUES:
65                return False
66            # The presence of a value is generally regarded as truthy, so we treat it as such
67            return True

InterpolatedBoolean( condition: str, parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

condition: str

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

def eval(self, config: Mapping[str, Any], **additional_parameters: Any) -> bool: View Source

46    def eval(self, config: Config, **additional_parameters: Any) -> bool:
47        """
48        Interpolates the predicate condition string using the config and other optional arguments passed as parameter.
49
50        :param config: The user-provided configuration as specified by the source's spec
51        :param additional_parameters: Optional parameters used for interpolation
52        :return: The interpolated string
53        """
54        if isinstance(self.condition, bool):
55            return self.condition
56        else:
57            evaluated = self._interpolation.eval(
58                self.condition,
59                config,
60                self._default,
61                parameters=self._parameters,
62                **additional_parameters,
63            )
64            if evaluated in FALSE_VALUES:
65                return False
66            # The presence of a value is generally regarded as truthy, so we treat it as such
67            return True

Interpolates the predicate condition string using the config and other optional arguments passed as parameter.

Parameters

config: The user-provided configuration as specified by the source's spec
additional_parameters: Optional parameters used for interpolation

Returns

The interpolated string

@dataclass

class InterpolatedRequestInputProvider: View Source

15@dataclass
16class InterpolatedRequestInputProvider:
17    """
18    Helper class that generically performs string interpolation on the provided dictionary or string input
19    """
20
21    parameters: InitVar[Mapping[str, Any]]
22    request_inputs: Optional[Union[str, Mapping[str, str]]] = field(default=None)
23    config: Config = field(default_factory=dict)
24    _interpolator: Optional[Union[InterpolatedString, InterpolatedMapping]] = field(
25        init=False, repr=False, default=None
26    )
27    _request_inputs: Optional[Union[str, Mapping[str, str]]] = field(
28        init=False, repr=False, default=None
29    )
30
31    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
32        self._request_inputs = self.request_inputs or {}
33        if isinstance(self._request_inputs, str):
34            self._interpolator = InterpolatedString(
35                self._request_inputs, default="", parameters=parameters
36            )
37        else:
38            self._interpolator = InterpolatedMapping(self._request_inputs, parameters=parameters)
39
40    def eval_request_inputs(
41        self,
42        stream_slice: Optional[StreamSlice] = None,
43        next_page_token: Optional[Mapping[str, Any]] = None,
44        valid_key_types: Optional[Tuple[Type[Any]]] = None,
45        valid_value_types: Optional[Tuple[Type[Any], ...]] = None,
46    ) -> Mapping[str, Any]:
47        """
48        Returns the request inputs to set on an outgoing HTTP request
49
50        :param stream_slice: The stream slice
51        :param next_page_token: The pagination token
52        :param valid_key_types: A tuple of types that the interpolator should allow
53        :param valid_value_types: A tuple of types that the interpolator should allow
54        :return: The request inputs to set on an outgoing HTTP request
55        """
56        kwargs = get_interpolation_context(
57            stream_slice=stream_slice,
58            next_page_token=next_page_token,
59        )
60        interpolated_value = self._interpolator.eval(  # type: ignore # self._interpolator is always initialized with a value and will not be None
61            self.config,
62            valid_key_types=valid_key_types,
63            valid_value_types=valid_value_types,
64            **kwargs,
65        )
66
67        if isinstance(interpolated_value, dict):
68            non_null_tokens = {k: v for k, v in interpolated_value.items() if v is not None}
69            return non_null_tokens
70        return interpolated_value  # type: ignore[no-any-return]

Helper class that generically performs string interpolation on the provided dictionary or string input

InterpolatedRequestInputProvider( parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], request_inputs: Union[str, Mapping[str, str], NoneType] = None, config: Mapping[str, Any] = <factory>)

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

request_inputs: Union[str, Mapping[str, str], NoneType] = None

config: Mapping[str, Any]

def eval_request_inputs( self, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None, valid_key_types: Optional[Tuple[Type[Any]]] = None, valid_value_types: Optional[Tuple[Type[Any], ...]] = None) -> Mapping[str, Any]: View Source

40    def eval_request_inputs(
41        self,
42        stream_slice: Optional[StreamSlice] = None,
43        next_page_token: Optional[Mapping[str, Any]] = None,
44        valid_key_types: Optional[Tuple[Type[Any]]] = None,
45        valid_value_types: Optional[Tuple[Type[Any], ...]] = None,
46    ) -> Mapping[str, Any]:
47        """
48        Returns the request inputs to set on an outgoing HTTP request
49
50        :param stream_slice: The stream slice
51        :param next_page_token: The pagination token
52        :param valid_key_types: A tuple of types that the interpolator should allow
53        :param valid_value_types: A tuple of types that the interpolator should allow
54        :return: The request inputs to set on an outgoing HTTP request
55        """
56        kwargs = get_interpolation_context(
57            stream_slice=stream_slice,
58            next_page_token=next_page_token,
59        )
60        interpolated_value = self._interpolator.eval(  # type: ignore # self._interpolator is always initialized with a value and will not be None
61            self.config,
62            valid_key_types=valid_key_types,
63            valid_value_types=valid_value_types,
64            **kwargs,
65        )
66
67        if isinstance(interpolated_value, dict):
68            non_null_tokens = {k: v for k, v in interpolated_value.items() if v is not None}
69            return non_null_tokens
70        return interpolated_value  # type: ignore[no-any-return]

Returns the request inputs to set on an outgoing HTTP request

Parameters

stream_slice: The stream slice
next_page_token: The pagination token
valid_key_types: A tuple of types that the interpolator should allow
valid_value_types: A tuple of types that the interpolator should allow

Returns

The request inputs to set on an outgoing HTTP request

@dataclass

class InterpolatedString: View Source

13@dataclass
14class InterpolatedString:
15    """
16    Wrapper around a raw string to be interpolated with the Jinja2 templating engine
17
18    Attributes:
19        string (str): The string to evalute
20        default (Optional[str]): The default value to return if the evaluation returns an empty string
21        parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation
22    """
23
24    string: str
25    parameters: InitVar[Mapping[str, Any]]
26    default: Optional[str] = None
27
28    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
29        self.default = self.default or self.string
30        self._interpolation = JinjaInterpolation()
31        self._parameters = parameters
32        # indicates whether passed string is just a plain string, not Jinja template
33        # This allows for optimization, but we do not know it yet at this stage
34        self._is_plain_string = None
35
36    def eval(self, config: Config, **kwargs: Any) -> Any:
37        """
38        Interpolates the input string using the config and other optional arguments passed as parameter.
39
40        :param config: The user-provided configuration as specified by the source's spec
41        :param kwargs: Optional parameters used for interpolation
42        :return: The interpolated string
43        """
44        if self._is_plain_string:
45            return self.string
46        if self._is_plain_string is None:
47            # Let's check whether output from evaluation is the same as input.
48            # This indicates occurrence of a plain string, not a template and we can skip Jinja in subsequent runs.
49            evaluated = self._interpolation.eval(
50                self.string, config, self.default, parameters=self._parameters, **kwargs
51            )
52            self._is_plain_string = self.string == evaluated
53            return evaluated
54        return self._interpolation.eval(
55            self.string, config, self.default, parameters=self._parameters, **kwargs
56        )
57
58    def __eq__(self, other: Any) -> bool:
59        if not isinstance(other, InterpolatedString):
60            return False
61        return self.string == other.string and self.default == other.default
62
63    @classmethod
64    def create(
65        cls,
66        string_or_interpolated: Union["InterpolatedString", str],
67        *,
68        parameters: Mapping[str, Any],
69    ) -> "InterpolatedString":
70        """
71        Helper function to obtain an InterpolatedString from either a raw string or an InterpolatedString.
72
73        :param string_or_interpolated: either a raw string or an InterpolatedString.
74        :param parameters: parameters propagated from parent component
75        :return: InterpolatedString representing the input string.
76        """
77        if isinstance(string_or_interpolated, str):
78            return InterpolatedString(string=string_or_interpolated, parameters=parameters)
79        else:
80            return string_or_interpolated

Wrapper around a raw string to be interpolated with the Jinja2 templating engine

Attributes:

string (str): The string to evalute
default (Optional[str]): The default value to return if the evaluation returns an empty string
parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation

InterpolatedString( string: str, parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], default: Optional[str] = None)

string: str

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

default: Optional[str] = None

def eval(self, config: Mapping[str, Any], **kwargs: Any) -> Any: View Source

36    def eval(self, config: Config, **kwargs: Any) -> Any:
37        """
38        Interpolates the input string using the config and other optional arguments passed as parameter.
39
40        :param config: The user-provided configuration as specified by the source's spec
41        :param kwargs: Optional parameters used for interpolation
42        :return: The interpolated string
43        """
44        if self._is_plain_string:
45            return self.string
46        if self._is_plain_string is None:
47            # Let's check whether output from evaluation is the same as input.
48            # This indicates occurrence of a plain string, not a template and we can skip Jinja in subsequent runs.
49            evaluated = self._interpolation.eval(
50                self.string, config, self.default, parameters=self._parameters, **kwargs
51            )
52            self._is_plain_string = self.string == evaluated
53            return evaluated
54        return self._interpolation.eval(
55            self.string, config, self.default, parameters=self._parameters, **kwargs
56        )

Interpolates the input string using the config and other optional arguments passed as parameter.

Parameters

config: The user-provided configuration as specified by the source's spec
kwargs: Optional parameters used for interpolation

Returns

The interpolated string

@classmethod

def create( cls, string_or_interpolated: Union[InterpolatedString, str], *, parameters: Mapping[str, Any]) -> InterpolatedString: View Source

63    @classmethod
64    def create(
65        cls,
66        string_or_interpolated: Union["InterpolatedString", str],
67        *,
68        parameters: Mapping[str, Any],
69    ) -> "InterpolatedString":
70        """
71        Helper function to obtain an InterpolatedString from either a raw string or an InterpolatedString.
72
73        :param string_or_interpolated: either a raw string or an InterpolatedString.
74        :param parameters: parameters propagated from parent component
75        :return: InterpolatedString representing the input string.
76        """
77        if isinstance(string_or_interpolated, str):
78            return InterpolatedString(string=string_or_interpolated, parameters=parameters)
79        else:
80            return string_or_interpolated

Helper function to obtain an InterpolatedString from either a raw string or an InterpolatedString.

Parameters

string_or_interpolated: either a raw string or an InterpolatedString.
parameters: parameters propagated from parent component

Returns

InterpolatedString representing the input string.

class JsonDecoder(airbyte_cdk.Decoder): View Source

20class JsonDecoder(Decoder):
21    """
22    Decoder strategy that returns the json-encoded content of a response, if any.
23
24    Usually, we would try to instantiate the equivalent `CompositeRawDecoder(parser=JsonParser(), stream_response=False)` but there were specific historical behaviors related to the JsonDecoder that we didn't know if we could remove like the fallback on {} in case of errors.
25    """
26
27    def __init__(self, parameters: Mapping[str, Any]):
28        self._decoder = CompositeRawDecoder(parser=JsonParser(), stream_response=False)
29
30    def is_stream_response(self) -> bool:
31        return self._decoder.is_stream_response()
32
33    def decode(
34        self, response: requests.Response
35    ) -> Generator[MutableMapping[str, Any], None, None]:
36        """
37        Given the response is an empty string or an emtpy list, the function will return a generator with an empty mapping.
38        """
39        has_yielded = False
40        try:
41            for element in self._decoder.decode(response):
42                yield element
43                has_yielded = True
44        except Exception:
45            yield {}
46
47        if not has_yielded:
48            yield {}

Decoder strategy that returns the json-encoded content of a response, if any.

Usually, we would try to instantiate the equivalent CompositeRawDecoder(parser=JsonParser(), stream_response=False) but there were specific historical behaviors related to the JsonDecoder that we didn't know if we could remove like the fallback on {} in case of errors.

JsonDecoder(parameters: Mapping[str, Any]) View Source

27    def __init__(self, parameters: Mapping[str, Any]):
28        self._decoder = CompositeRawDecoder(parser=JsonParser(), stream_response=False)

def is_stream_response(self) -> bool: View Source

30    def is_stream_response(self) -> bool:
31        return self._decoder.is_stream_response()

Set to True if you'd like to use stream=True option in http requester

def decode( self, response: requests.models.Response) -> Generator[MutableMapping[str, Any], NoneType, NoneType]: View Source

33    def decode(
34        self, response: requests.Response
35    ) -> Generator[MutableMapping[str, Any], None, None]:
36        """
37        Given the response is an empty string or an emtpy list, the function will return a generator with an empty mapping.
38        """
39        has_yielded = False
40        try:
41            for element in self._decoder.decode(response):
42                yield element
43                has_yielded = True
44        except Exception:
45            yield {}
46
47        if not has_yielded:
48            yield {}

Given the response is an empty string or an emtpy list, the function will return a generator with an empty mapping.

@dataclass

class JsonFileSchemaLoader(airbyte_cdk.ResourceSchemaLoader, airbyte_cdk.sources.declarative.schema.schema_loader.SchemaLoader): View Source

33@dataclass
34class JsonFileSchemaLoader(ResourceSchemaLoader, SchemaLoader):
35    """
36    Loads the schema from a json file
37
38    Attributes:
39        file_path (Union[InterpolatedString, str]): The path to the json file describing the schema
40        name (str): The stream's name
41        config (Config): The user-provided configuration as specified by the source's spec
42        parameters (Mapping[str, Any]): Additional arguments to pass to the string interpolation if needed
43    """
44
45    config: Config
46    parameters: InitVar[Mapping[str, Any]]
47    file_path: Union[InterpolatedString, str] = field(default="")
48
49    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
50        if not self.file_path:
51            self.file_path = _default_file_path()
52        self.file_path = InterpolatedString.create(self.file_path, parameters=parameters)
53
54    def get_json_schema(self) -> Mapping[str, Any]:
55        # todo: It is worth revisiting if we can replace file_path with just file_name if every schema is in the /schemas directory
56        # this would require that we find a creative solution to store or retrieve source_name in here since the files are mounted there
57        json_schema_path = self._get_json_filepath()
58        resource, schema_path = self.extract_resource_and_schema_path(json_schema_path)
59        raw_json_file = pkgutil.get_data(resource, schema_path)
60
61        if not raw_json_file:
62            raise IOError(f"Cannot find file {json_schema_path}")
63        try:
64            raw_schema = json.loads(raw_json_file)
65        except ValueError as err:
66            raise RuntimeError(f"Invalid JSON file format for file {json_schema_path}") from err
67        self.package_name = resource
68        return self._resolve_schema_references(raw_schema)
69
70    def _get_json_filepath(self) -> Any:
71        return self.file_path.eval(self.config)  # type: ignore # file_path is always cast to an interpolated string
72
73    @staticmethod
74    def extract_resource_and_schema_path(json_schema_path: str) -> Tuple[str, str]:
75        """
76        When the connector is running on a docker container, package_data is accessible from the resource (source_<name>), so we extract
77        the resource from the first part of the schema path and the remaining path is used to find the schema file. This is a slight
78        hack to identify the source name while we are in the airbyte_cdk module.
79        :param json_schema_path: The path to the schema JSON file
80        :return: Tuple of the resource name and the path to the schema file
81        """
82        split_path = json_schema_path.split("/")
83
84        if split_path[0] == "" or split_path[0] == ".":
85            split_path = split_path[1:]
86
87        if len(split_path) == 0:
88            return "", ""
89
90        if len(split_path) == 1:
91            return "", split_path[0]
92
93        return split_path[0], "/".join(split_path[1:])

Loads the schema from a json file

Attributes:

file_path (Union[InterpolatedString, str]): The path to the json file describing the schema
name (str): The stream's name
config (Config): The user-provided configuration as specified by the source's spec
parameters (Mapping[str, Any]): Additional arguments to pass to the string interpolation if needed

JsonFileSchemaLoader( config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], file_path: Union[InterpolatedString, str] = '')

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

file_path: Union[InterpolatedString, str] = ''

def get_json_schema(self) -> Mapping[str, Any]: View Source

54    def get_json_schema(self) -> Mapping[str, Any]:
55        # todo: It is worth revisiting if we can replace file_path with just file_name if every schema is in the /schemas directory
56        # this would require that we find a creative solution to store or retrieve source_name in here since the files are mounted there
57        json_schema_path = self._get_json_filepath()
58        resource, schema_path = self.extract_resource_and_schema_path(json_schema_path)
59        raw_json_file = pkgutil.get_data(resource, schema_path)
60
61        if not raw_json_file:
62            raise IOError(f"Cannot find file {json_schema_path}")
63        try:
64            raw_schema = json.loads(raw_json_file)
65        except ValueError as err:
66            raise RuntimeError(f"Invalid JSON file format for file {json_schema_path}") from err
67        self.package_name = resource
68        return self._resolve_schema_references(raw_schema)

Returns a mapping describing the stream's schema

@staticmethod

def extract_resource_and_schema_path(json_schema_path: str) -> Tuple[str, str]: View Source

73    @staticmethod
74    def extract_resource_and_schema_path(json_schema_path: str) -> Tuple[str, str]:
75        """
76        When the connector is running on a docker container, package_data is accessible from the resource (source_<name>), so we extract
77        the resource from the first part of the schema path and the remaining path is used to find the schema file. This is a slight
78        hack to identify the source name while we are in the airbyte_cdk module.
79        :param json_schema_path: The path to the schema JSON file
80        :return: Tuple of the resource name and the path to the schema file
81        """
82        split_path = json_schema_path.split("/")
83
84        if split_path[0] == "" or split_path[0] == ".":
85            split_path = split_path[1:]
86
87        if len(split_path) == 0:
88            return "", ""
89
90        if len(split_path) == 1:
91            return "", split_path[0]
92
93        return split_path[0], "/".join(split_path[1:])

When the connector is running on a docker container, package_data is accessible from the resource (source_), so we extract the resource from the first part of the schema path and the remaining path is used to find the schema file. This is a slight hack to identify the source name while we are in the airbyte_cdk module.

Parameters

json_schema_path: The path to the schema JSON file

Returns

Tuple of the resource name and the path to the schema file

Inherited Members

ResourceSchemaLoader: package_name; get_schema

class LegacyToPerPartitionStateMigration(airbyte_cdk.sources.declarative.migrations.state_migration.StateMigration): View Source

 20class LegacyToPerPartitionStateMigration(StateMigration):
 21    """
 22    Transforms the input state for per-partitioned streams from the legacy format to the low-code format.
 23    The cursor field and partition ID fields are automatically extracted from the stream's DatetimebasedCursor and SubstreamPartitionRouter.
 24
 25    Example input state:
 26    {
 27    "13506132": {
 28      "last_changed": "2022-12-27T08:34:39+00:00"
 29    }
 30    Example output state:
 31    {
 32      "partition": {"id": "13506132"},
 33      "cursor": {"last_changed": "2022-12-27T08:34:39+00:00"}
 34    }
 35    """
 36
 37    def __init__(
 38        self,
 39        partition_router: SubstreamPartitionRouter,
 40        cursor: CustomIncrementalSync | DatetimeBasedCursor,
 41        config: Mapping[str, Any],
 42        parameters: Mapping[str, Any],
 43    ):
 44        self._partition_router = partition_router
 45        self._cursor = cursor
 46        self._config = config
 47        self._parameters = parameters
 48        self._partition_key_field = InterpolatedString.create(
 49            self._get_partition_field(self._partition_router), parameters=self._parameters
 50        ).eval(self._config)
 51        self._cursor_field = InterpolatedString.create(
 52            self._cursor.cursor_field, parameters=self._parameters
 53        ).eval(self._config)
 54
 55    def _get_partition_field(self, partition_router: SubstreamPartitionRouter) -> str:
 56        parent_stream_config = partition_router.parent_stream_configs[0]
 57
 58        # Retrieve the partition field with a condition, as properties are returned as a dictionary for custom components.
 59        partition_field = (
 60            parent_stream_config.partition_field
 61            if isinstance(parent_stream_config, ParentStreamConfig)
 62            else parent_stream_config.get("partition_field")  # type: ignore # See above comment on why parent_stream_config might be a dict
 63        )
 64
 65        return partition_field
 66
 67    def should_migrate(self, stream_state: Mapping[str, Any]) -> bool:
 68        if _is_already_migrated(stream_state):
 69            return False
 70
 71        # There is exactly one parent stream
 72        number_of_parent_streams = len(self._partition_router.parent_stream_configs)  # type: ignore # custom partition will introduce this attribute if needed
 73        if number_of_parent_streams != 1:
 74            # There should be exactly one parent stream
 75            return False
 76        """
 77        The expected state format is
 78        "<parent_key_id>" : {
 79          "<cursor_field>" : "<cursor_value>"
 80        }
 81        """
 82        if not stream_state:
 83            return False
 84        for key, value in stream_state.items():
 85            # it is expected the internal value to be a dictionary according to docstring
 86            if not isinstance(value, dict):
 87                return False
 88            keys = list(value.keys())
 89            if len(keys) != 1:
 90                # The input partitioned state should only have one key
 91                return False
 92            if keys[0] != self._cursor_field:
 93                # Unexpected key. Found {keys[0]}. Expected {self._cursor.cursor_field}
 94                return False
 95
 96        return True
 97
 98    def migrate(self, stream_state: Mapping[str, Any]) -> Mapping[str, Any]:
 99        states = [
100            {"partition": {self._partition_key_field: key}, "cursor": value}
101            for key, value in stream_state.items()
102        ]
103        return {"states": states}

Transforms the input state for per-partitioned streams from the legacy format to the low-code format. The cursor field and partition ID fields are automatically extracted from the stream's DatetimebasedCursor and SubstreamPartitionRouter.

Example input state: { "13506132": { "last_changed": "2022-12-27T08:34:39+00:00" } Example output state: { "partition": {"id": "13506132"}, "cursor": {"last_changed": "2022-12-27T08:34:39+00:00"} }

LegacyToPerPartitionStateMigration( partition_router: airbyte_cdk.sources.declarative.models.declarative_component_schema.SubstreamPartitionRouter, cursor: airbyte_cdk.sources.declarative.models.declarative_component_schema.CustomIncrementalSync | airbyte_cdk.sources.declarative.models.declarative_component_schema.DatetimeBasedCursor, config: Mapping[str, Any], parameters: Mapping[str, Any]) View Source

37    def __init__(
38        self,
39        partition_router: SubstreamPartitionRouter,
40        cursor: CustomIncrementalSync | DatetimeBasedCursor,
41        config: Mapping[str, Any],
42        parameters: Mapping[str, Any],
43    ):
44        self._partition_router = partition_router
45        self._cursor = cursor
46        self._config = config
47        self._parameters = parameters
48        self._partition_key_field = InterpolatedString.create(
49            self._get_partition_field(self._partition_router), parameters=self._parameters
50        ).eval(self._config)
51        self._cursor_field = InterpolatedString.create(
52            self._cursor.cursor_field, parameters=self._parameters
53        ).eval(self._config)

def should_migrate(self, stream_state: Mapping[str, Any]) -> bool: View Source

67    def should_migrate(self, stream_state: Mapping[str, Any]) -> bool:
68        if _is_already_migrated(stream_state):
69            return False
70
71        # There is exactly one parent stream
72        number_of_parent_streams = len(self._partition_router.parent_stream_configs)  # type: ignore # custom partition will introduce this attribute if needed
73        if number_of_parent_streams != 1:
74            # There should be exactly one parent stream
75            return False
76        """
77        The expected state format is
78        "<parent_key_id>" : {
79          "<cursor_field>" : "<cursor_value>"
80        }
81        """
82        if not stream_state:
83            return False
84        for key, value in stream_state.items():
85            # it is expected the internal value to be a dictionary according to docstring
86            if not isinstance(value, dict):
87                return False
88            keys = list(value.keys())
89            if len(keys) != 1:
90                # The input partitioned state should only have one key
91                return False
92            if keys[0] != self._cursor_field:
93                # Unexpected key. Found {keys[0]}. Expected {self._cursor.cursor_field}
94                return False
95
96        return True

Check if the stream_state should be migrated

Parameters

stream_state: The stream_state to potentially migrate

Returns

true if the state is of the expected format and should be migrated. False otherwise.

def migrate(self, stream_state: Mapping[str, Any]) -> Mapping[str, Any]: View Source

 98    def migrate(self, stream_state: Mapping[str, Any]) -> Mapping[str, Any]:
 99        states = [
100            {"partition": {self._partition_key_field: key}, "cursor": value}
101            for key, value in stream_state.items()
102        ]
103        return {"states": states}

Migrate the stream_state. Assumes should_migrate(stream_state) returned True.

Parameters

stream_state: The stream_state to migrate

Returns

The migrated stream_state

@dataclass

class MinMaxDatetime: View Source

 14@dataclass
 15class MinMaxDatetime:
 16    """
 17    Compares the provided date against optional minimum or maximum times. If date is earlier than
 18    min_date, then min_date is returned. If date is greater than max_date, then max_date is returned.
 19    If neither, the input date is returned.
 20
 21    The timestamp format accepts the same format codes as datetime.strfptime, which are
 22    all the format codes required by the 1989 C standard.
 23    Full list of accepted format codes: https://man7.org/linux/man-pages/man3/strftime.3.html
 24
 25    Attributes:
 26        datetime (Union[InterpolatedString, str]): InterpolatedString or string representing the datetime in the format specified by `datetime_format`
 27        datetime_format (str): Format of the datetime passed as argument
 28        min_datetime (Union[InterpolatedString, str]): Represents the minimum allowed datetime value.
 29        max_datetime (Union[InterpolatedString, str]): Represents the maximum allowed datetime value.
 30    """
 31
 32    datetime: Union[InterpolatedString, str]
 33    parameters: InitVar[Mapping[str, Any]]
 34    # datetime_format is a unique case where we inherit it from the parent if it is not specified before using the default value
 35    # which is why we need dedicated getter/setter methods and private dataclass field
 36    datetime_format: str
 37    _datetime_format: str = field(init=False, repr=False, default="")
 38    min_datetime: Union[InterpolatedString, str] = ""
 39    max_datetime: Union[InterpolatedString, str] = ""
 40
 41    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 42        self.datetime = InterpolatedString.create(self.datetime, parameters=parameters or {})
 43        self._parser = DatetimeParser()
 44        self.min_datetime = (
 45            InterpolatedString.create(self.min_datetime, parameters=parameters)  # type: ignore [assignment]  #  expression has type "InterpolatedString | None", variable has type "InterpolatedString | str"
 46            if self.min_datetime
 47            else None
 48        )  # type: ignore
 49        self.max_datetime = (
 50            InterpolatedString.create(self.max_datetime, parameters=parameters)  # type: ignore [assignment]  #  expression has type "InterpolatedString | None", variable has type "InterpolatedString | str"
 51            if self.max_datetime
 52            else None
 53        )  # type: ignore
 54
 55    def get_datetime(
 56        self, config: Mapping[str, Any], **additional_parameters: Mapping[str, Any]
 57    ) -> dt.datetime:
 58        """
 59        Evaluates and returns the datetime
 60        :param config: The user-provided configuration as specified by the source's spec
 61        :param additional_parameters: Additional arguments to be passed to the strings for interpolation
 62        :return: The evaluated datetime
 63        """
 64        # We apply a default datetime format here instead of at instantiation, so it can be set by the parent first
 65        datetime_format = self._datetime_format
 66        if not datetime_format:
 67            datetime_format = "%Y-%m-%dT%H:%M:%S.%f%z"
 68
 69        time = self._parser.parse(
 70            str(
 71                self.datetime.eval(  # type: ignore[union-attr] # str has no attribute "eval"
 72                    config,
 73                    **additional_parameters,
 74                )
 75            ),
 76            datetime_format,
 77        )  # type: ignore # datetime is always cast to an interpolated string
 78
 79        if self.min_datetime:
 80            min_time = str(self.min_datetime.eval(config, **additional_parameters))  # type: ignore # min_datetime is always cast to an interpolated string
 81            if min_time:
 82                min_datetime = self._parser.parse(min_time, datetime_format)  # type: ignore # min_datetime is always cast to an interpolated string
 83                time = max(time, min_datetime)
 84        if self.max_datetime:
 85            max_time = str(self.max_datetime.eval(config, **additional_parameters))  # type: ignore # max_datetime is always cast to an interpolated string
 86            if max_time:
 87                max_datetime = self._parser.parse(max_time, datetime_format)
 88                time = min(time, max_datetime)
 89        return time
 90
 91    @property  # type: ignore # properties don't play well with dataclasses...
 92    def datetime_format(self) -> str:
 93        """The format of the string representing the datetime"""
 94        return self._datetime_format
 95
 96    @datetime_format.setter
 97    def datetime_format(self, value: str) -> None:
 98        """Setter for the datetime format"""
 99        # Covers the case where datetime_format is not provided in the constructor, which causes the property object
100        # to be set which we need to avoid doing
101        if not isinstance(value, property):
102            self._datetime_format = value
103
104    @classmethod
105    def create(
106        cls,
107        interpolated_string_or_min_max_datetime: Union[InterpolatedString, str, "MinMaxDatetime"],
108        parameters: Optional[Mapping[str, Any]] = None,
109    ) -> "MinMaxDatetime":
110        if parameters is None:
111            parameters = {}
112        if isinstance(interpolated_string_or_min_max_datetime, InterpolatedString) or isinstance(
113            interpolated_string_or_min_max_datetime, str
114        ):
115            return MinMaxDatetime(  # type: ignore [call-arg]
116                datetime=interpolated_string_or_min_max_datetime, parameters=parameters
117            )
118        else:
119            return interpolated_string_or_min_max_datetime

Compares the provided date against optional minimum or maximum times. If date is earlier than min_date, then min_date is returned. If date is greater than max_date, then max_date is returned. If neither, the input date is returned.

The timestamp format accepts the same format codes as datetime.strfptime, which are all the format codes required by the 1989 C standard. Full list of accepted format codes: https://man7.org/linux/man-pages/man3/strftime.3.html

Attributes:

datetime (Union[InterpolatedString, str]): InterpolatedString or string representing the datetime in the format specified by datetime_format
datetime_format (str): Format of the datetime passed as argument
min_datetime (Union[InterpolatedString, str]): Represents the minimum allowed datetime value.
max_datetime (Union[InterpolatedString, str]): Represents the maximum allowed datetime value.

MinMaxDatetime( datetime: Union[InterpolatedString, str], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], datetime_format: str = <property object>, min_datetime: Union[InterpolatedString, str] = '', max_datetime: Union[InterpolatedString, str] = '')

datetime: Union[InterpolatedString, str]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

datetime_format: str View Source

91    @property  # type: ignore # properties don't play well with dataclasses...
92    def datetime_format(self) -> str:
93        """The format of the string representing the datetime"""
94        return self._datetime_format

The format of the string representing the datetime

min_datetime: Union[InterpolatedString, str] = ''

max_datetime: Union[InterpolatedString, str] = ''

def get_datetime( self, config: Mapping[str, Any], **additional_parameters: Mapping[str, Any]) -> datetime.datetime: View Source

55    def get_datetime(
56        self, config: Mapping[str, Any], **additional_parameters: Mapping[str, Any]
57    ) -> dt.datetime:
58        """
59        Evaluates and returns the datetime
60        :param config: The user-provided configuration as specified by the source's spec
61        :param additional_parameters: Additional arguments to be passed to the strings for interpolation
62        :return: The evaluated datetime
63        """
64        # We apply a default datetime format here instead of at instantiation, so it can be set by the parent first
65        datetime_format = self._datetime_format
66        if not datetime_format:
67            datetime_format = "%Y-%m-%dT%H:%M:%S.%f%z"
68
69        time = self._parser.parse(
70            str(
71                self.datetime.eval(  # type: ignore[union-attr] # str has no attribute "eval"
72                    config,
73                    **additional_parameters,
74                )
75            ),
76            datetime_format,
77        )  # type: ignore # datetime is always cast to an interpolated string
78
79        if self.min_datetime:
80            min_time = str(self.min_datetime.eval(config, **additional_parameters))  # type: ignore # min_datetime is always cast to an interpolated string
81            if min_time:
82                min_datetime = self._parser.parse(min_time, datetime_format)  # type: ignore # min_datetime is always cast to an interpolated string
83                time = max(time, min_datetime)
84        if self.max_datetime:
85            max_time = str(self.max_datetime.eval(config, **additional_parameters))  # type: ignore # max_datetime is always cast to an interpolated string
86            if max_time:
87                max_datetime = self._parser.parse(max_time, datetime_format)
88                time = min(time, max_datetime)
89        return time

Evaluates and returns the datetime

Parameters

config: The user-provided configuration as specified by the source's spec
additional_parameters: Additional arguments to be passed to the strings for interpolation

Returns

The evaluated datetime

@classmethod

def create( cls, interpolated_string_or_min_max_datetime: Union[InterpolatedString, str, MinMaxDatetime], parameters: Optional[Mapping[str, Any]] = None) -> MinMaxDatetime: View Source

104    @classmethod
105    def create(
106        cls,
107        interpolated_string_or_min_max_datetime: Union[InterpolatedString, str, "MinMaxDatetime"],
108        parameters: Optional[Mapping[str, Any]] = None,
109    ) -> "MinMaxDatetime":
110        if parameters is None:
111            parameters = {}
112        if isinstance(interpolated_string_or_min_max_datetime, InterpolatedString) or isinstance(
113            interpolated_string_or_min_max_datetime, str
114        ):
115            return MinMaxDatetime(  # type: ignore [call-arg]
116                datetime=interpolated_string_or_min_max_datetime, parameters=parameters
117            )
118        else:
119            return interpolated_string_or_min_max_datetime

@dataclass

class NoAuth(airbyte_cdk.DeclarativeAuthenticator): View Source

33@dataclass
34class NoAuth(DeclarativeAuthenticator):
35    parameters: InitVar[Mapping[str, Any]]
36
37    @property
38    def auth_header(self) -> str:
39        return ""
40
41    @property
42    def token(self) -> str:
43        return ""

NoAuth(parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

auth_header: str View Source

37    @property
38    def auth_header(self) -> str:
39        return ""

HTTP header to set on the requests

token: str View Source

41    @property
42    def token(self) -> str:
43        return ""

The header value to set on outgoing HTTP requests

Inherited Members

DeclarativeAuthenticator: get_request_params; get_request_body_data; get_request_body_json
AbstractHeaderAuthenticator: get_auth_header

@dataclass

class OffsetIncrement(airbyte_cdk.PaginationStrategy): View Source

 24@dataclass
 25class OffsetIncrement(PaginationStrategy):
 26    """
 27    Pagination strategy that returns the number of records reads so far and returns it as the next page token
 28    Examples:
 29        # page_size to be a constant integer value
 30        pagination_strategy:
 31          type: OffsetIncrement
 32          page_size: 2
 33
 34        # page_size to be a constant string value
 35        pagination_strategy:
 36          type: OffsetIncrement
 37          page_size: "2"
 38
 39        # page_size to be an interpolated string value
 40        pagination_strategy:
 41          type: OffsetIncrement
 42          page_size: "{{ parameters['items_per_page'] }}"
 43
 44    Attributes:
 45        page_size (InterpolatedString): the number of records to request
 46    """
 47
 48    config: Config
 49    page_size: Optional[Union[str, int]]
 50    parameters: InitVar[Mapping[str, Any]]
 51    extractor: Optional[RecordExtractor]
 52    decoder: Decoder = field(
 53        default_factory=lambda: PaginationDecoderDecorator(decoder=JsonDecoder(parameters={}))
 54    )
 55    inject_on_first_request: bool = False
 56
 57    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 58        page_size = str(self.page_size) if isinstance(self.page_size, int) else self.page_size
 59        if page_size:
 60            self._page_size: Optional[InterpolatedString] = InterpolatedString(
 61                page_size, parameters=parameters
 62            )
 63        else:
 64            self._page_size = None
 65
 66    @property
 67    def initial_token(self) -> Optional[Any]:
 68        if self.inject_on_first_request:
 69            return 0
 70        return None
 71
 72    def next_page_token(
 73        self,
 74        response: requests.Response,
 75        last_page_size: int,
 76        last_record: Optional[Record],
 77        last_page_token_value: Optional[Any] = None,
 78    ) -> Optional[Any]:
 79        decoded_response = next(self.decoder.decode(response))
 80
 81        if self.extractor:
 82            page_size_from_response = len(list(self.extractor.extract_records(response=response)))
 83            # The extractor could return 0 records which is valid, but evaluates to False. Our fallback in other
 84            # cases as the best effort option is to use the incoming last_page_size
 85            last_page_size = (
 86                page_size_from_response if page_size_from_response is not None else last_page_size
 87            )
 88
 89        # Stop paginating when there are fewer records than the page size or the current page has no records
 90        if (
 91            self._page_size
 92            and last_page_size < self._page_size.eval(self.config, response=decoded_response)
 93        ) or last_page_size == 0:
 94            return None
 95        elif last_page_token_value is None:
 96            # If the OffsetIncrement strategy does not inject on the first request, the incoming last_page_token_value
 97            # will be None. For this case, we assume that None was the first page and progress to the next offset
 98            return 0 + last_page_size
 99        elif not isinstance(last_page_token_value, int):
100            raise ValueError(
101                f"Last page token value {last_page_token_value} for OffsetIncrement pagination strategy was not an integer"
102            )
103        else:
104            return last_page_token_value + last_page_size
105
106    def get_page_size(self) -> Optional[int]:
107        if self._page_size:
108            page_size = self._page_size.eval(self.config)
109            if not isinstance(page_size, int):
110                raise Exception(f"{page_size} is of type {type(page_size)}. Expected {int}")
111            return page_size
112        else:
113            return None

Pagination strategy that returns the number of records reads so far and returns it as the next page token

Examples:

page_size to be a constant integer value

pagination_strategy: type: OffsetIncrement page_size: 2

page_size to be a constant string value

pagination_strategy: type: OffsetIncrement page_size: "2"

page_size to be an interpolated string value

pagination_strategy: type: OffsetIncrement page_size: "{{ parameters['items_per_page'] }}"

Attributes:

page_size (InterpolatedString): the number of records to request

OffsetIncrement( config: Mapping[str, Any], page_size: Union[str, int, NoneType], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], extractor: Optional[RecordExtractor], decoder: Decoder = <factory>, inject_on_first_request: bool = False)

config: Mapping[str, Any]

page_size: Union[str, int, NoneType]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

extractor: Optional[RecordExtractor]

decoder: Decoder

inject_on_first_request: bool = False

initial_token: Optional[Any] View Source

66    @property
67    def initial_token(self) -> Optional[Any]:
68        if self.inject_on_first_request:
69            return 0
70        return None

Return the initial value of the token

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any] = None) -> Optional[Any]: View Source

 72    def next_page_token(
 73        self,
 74        response: requests.Response,
 75        last_page_size: int,
 76        last_record: Optional[Record],
 77        last_page_token_value: Optional[Any] = None,
 78    ) -> Optional[Any]:
 79        decoded_response = next(self.decoder.decode(response))
 80
 81        if self.extractor:
 82            page_size_from_response = len(list(self.extractor.extract_records(response=response)))
 83            # The extractor could return 0 records which is valid, but evaluates to False. Our fallback in other
 84            # cases as the best effort option is to use the incoming last_page_size
 85            last_page_size = (
 86                page_size_from_response if page_size_from_response is not None else last_page_size
 87            )
 88
 89        # Stop paginating when there are fewer records than the page size or the current page has no records
 90        if (
 91            self._page_size
 92            and last_page_size < self._page_size.eval(self.config, response=decoded_response)
 93        ) or last_page_size == 0:
 94            return None
 95        elif last_page_token_value is None:
 96            # If the OffsetIncrement strategy does not inject on the first request, the incoming last_page_token_value
 97            # will be None. For this case, we assume that None was the first page and progress to the next offset
 98            return 0 + last_page_size
 99        elif not isinstance(last_page_token_value, int):
100            raise ValueError(
101                f"Last page token value {last_page_token_value} for OffsetIncrement pagination strategy was not an integer"
102            )
103        else:
104            return last_page_token_value + last_page_size

Parameters

response: response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

next page token. Returns None if there are no more pages to fetch

def get_page_size(self) -> Optional[int]: View Source

106    def get_page_size(self) -> Optional[int]:
107        if self._page_size:
108            page_size = self._page_size.eval(self.config)
109            if not isinstance(page_size, int):
110                raise Exception(f"{page_size} is of type {type(page_size)}. Expected {int}")
111            return page_size
112        else:
113            return None

Returns

page size: The number of records to fetch in a page. Returns None if unspecified

@dataclass

class PageIncrement(airbyte_cdk.PaginationStrategy): View Source

18@dataclass
19class PageIncrement(PaginationStrategy):
20    """
21    Pagination strategy that returns the number of pages reads so far and returns it as the next page token
22
23    Attributes:
24        page_size (int): the number of records to request
25        start_from_page (int): number of the initial page
26    """
27
28    config: Config
29    page_size: Optional[Union[str, int]]
30    parameters: InitVar[Mapping[str, Any]]
31    start_from_page: int = 0
32    inject_on_first_request: bool = False
33
34    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
35        if isinstance(self.page_size, int) or (self.page_size is None):
36            self._page_size = self.page_size
37        else:
38            page_size = InterpolatedString(self.page_size, parameters=parameters).eval(self.config)
39            if not isinstance(page_size, int):
40                raise Exception(f"{page_size} is of type {type(page_size)}. Expected {int}")
41            self._page_size = page_size
42
43    @property
44    def initial_token(self) -> Optional[Any]:
45        if self.inject_on_first_request:
46            return self.start_from_page
47        return None
48
49    def next_page_token(
50        self,
51        response: requests.Response,
52        last_page_size: int,
53        last_record: Optional[Record],
54        last_page_token_value: Optional[Any],
55    ) -> Optional[Any]:
56        # Stop paginating when there are fewer records than the page size or the current page has no records
57        if (self._page_size and last_page_size < self._page_size) or last_page_size == 0:
58            return None
59        elif last_page_token_value is None:
60            # If the PageIncrement strategy does not inject on the first request, the incoming last_page_token_value
61            # may be None. When this is the case, we assume we've already requested the first page specified by
62            # start_from_page and must now get the next page
63            return self.start_from_page + 1
64        elif not isinstance(last_page_token_value, int):
65            raise ValueError(
66                f"Last page token value {last_page_token_value} for PageIncrement pagination strategy was not an integer"
67            )
68        else:
69            return last_page_token_value + 1
70
71    def get_page_size(self) -> Optional[int]:
72        return self._page_size

Pagination strategy that returns the number of pages reads so far and returns it as the next page token

Attributes:

page_size (int): the number of records to request
start_from_page (int): number of the initial page

PageIncrement( config: Mapping[str, Any], page_size: Union[str, int, NoneType], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], start_from_page: int = 0, inject_on_first_request: bool = False)

config: Mapping[str, Any]

page_size: Union[str, int, NoneType]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

start_from_page: int = 0

inject_on_first_request: bool = False

initial_token: Optional[Any] View Source

43    @property
44    def initial_token(self) -> Optional[Any]:
45        if self.inject_on_first_request:
46            return self.start_from_page
47        return None

Return the initial value of the token

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any]) -> Optional[Any]: View Source

49    def next_page_token(
50        self,
51        response: requests.Response,
52        last_page_size: int,
53        last_record: Optional[Record],
54        last_page_token_value: Optional[Any],
55    ) -> Optional[Any]:
56        # Stop paginating when there are fewer records than the page size or the current page has no records
57        if (self._page_size and last_page_size < self._page_size) or last_page_size == 0:
58            return None
59        elif last_page_token_value is None:
60            # If the PageIncrement strategy does not inject on the first request, the incoming last_page_token_value
61            # may be None. When this is the case, we assume we've already requested the first page specified by
62            # start_from_page and must now get the next page
63            return self.start_from_page + 1
64        elif not isinstance(last_page_token_value, int):
65            raise ValueError(
66                f"Last page token value {last_page_token_value} for PageIncrement pagination strategy was not an integer"
67            )
68        else:
69            return last_page_token_value + 1

Parameters

response: response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

next page token. Returns None if there are no more pages to fetch

def get_page_size(self) -> Optional[int]: View Source

71    def get_page_size(self) -> Optional[int]:
72        return self._page_size

Returns

page size: The number of records to fetch in a page. Returns None if unspecified

@dataclass

class PaginationStrategy: View Source

15@dataclass
16class PaginationStrategy:
17    """
18    Defines how to get the next page token
19    """
20
21    @property
22    @abstractmethod
23    def initial_token(self) -> Optional[Any]:
24        """
25        Return the initial value of the token
26        """
27
28    @abstractmethod
29    def next_page_token(
30        self,
31        response: requests.Response,
32        last_page_size: int,
33        last_record: Optional[Record],
34        last_page_token_value: Optional[Any],
35    ) -> Optional[Any]:
36        """
37        :param response: response to process
38        :param last_page_size: the number of records read from the response
39        :param last_record: the last record extracted from the response
40        :param last_page_token_value: The current value of the page token made on the last request
41        :return: next page token. Returns None if there are no more pages to fetch
42        """
43        pass
44
45    @abstractmethod
46    def get_page_size(self) -> Optional[int]:
47        """
48        :return: page size: The number of records to fetch in a page. Returns None if unspecified
49        """

Defines how to get the next page token

initial_token: Optional[Any] View Source

21    @property
22    @abstractmethod
23    def initial_token(self) -> Optional[Any]:
24        """
25        Return the initial value of the token
26        """

Return the initial value of the token

@abstractmethod

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any]) -> Optional[Any]: View Source

28    @abstractmethod
29    def next_page_token(
30        self,
31        response: requests.Response,
32        last_page_size: int,
33        last_record: Optional[Record],
34        last_page_token_value: Optional[Any],
35    ) -> Optional[Any]:
36        """
37        :param response: response to process
38        :param last_page_size: the number of records read from the response
39        :param last_record: the last record extracted from the response
40        :param last_page_token_value: The current value of the page token made on the last request
41        :return: next page token. Returns None if there are no more pages to fetch
42        """
43        pass

Parameters

response: response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

next page token. Returns None if there are no more pages to fetch

@abstractmethod

def get_page_size(self) -> Optional[int]: View Source

45    @abstractmethod
46    def get_page_size(self) -> Optional[int]:
47        """
48        :return: page size: The number of records to fetch in a page. Returns None if unspecified
49        """

Returns

page size: The number of records to fetch in a page. Returns None if unspecified

@dataclass

class ParentStreamConfig: View Source

31@dataclass
32class ParentStreamConfig:
33    """
34    Describes how to create a stream slice from a parent stream
35
36    stream: The stream to read records from
37    parent_key: The key of the parent stream's records that will be the stream slice key
38    partition_field: The partition key
39    extra_fields: Additional field paths to include in the stream slice
40    request_option: How to inject the slice value on an outgoing HTTP request
41    incremental_dependency (bool): Indicates if the parent stream should be read incrementally.
42    """
43
44    stream: "DeclarativeStream"  # Parent streams must be DeclarativeStream because we can't know which part of the stream slice is a partition for regular Stream
45    parent_key: Union[InterpolatedString, str]
46    partition_field: Union[InterpolatedString, str]
47    config: Config
48    parameters: InitVar[Mapping[str, Any]]
49    extra_fields: Optional[Union[List[List[str]], List[List[InterpolatedString]]]] = (
50        None  # List of field paths (arrays of strings)
51    )
52    request_option: Optional[RequestOption] = None
53    incremental_dependency: bool = False
54    lazy_read_pointer: Optional[List[Union[InterpolatedString, str]]] = None
55
56    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
57        self.parent_key = InterpolatedString.create(self.parent_key, parameters=parameters)
58        self.partition_field = InterpolatedString.create(
59            self.partition_field, parameters=parameters
60        )
61        if self.extra_fields:
62            # Create InterpolatedString for each field path in extra_keys
63            self.extra_fields = [
64                [InterpolatedString.create(path, parameters=parameters) for path in key_path]
65                for key_path in self.extra_fields
66            ]
67
68        self.lazy_read_pointer = (
69            [
70                InterpolatedString.create(path, parameters=parameters)
71                if isinstance(path, str)
72                else path
73                for path in self.lazy_read_pointer
74            ]
75            if self.lazy_read_pointer
76            else None
77        )

Describes how to create a stream slice from a parent stream

stream: The stream to read records from parent_key: The key of the parent stream's records that will be the stream slice key partition_field: The partition key extra_fields: Additional field paths to include in the stream slice request_option: How to inject the slice value on an outgoing HTTP request incremental_dependency (bool): Indicates if the parent stream should be read incrementally.

ParentStreamConfig( stream: DeclarativeStream, parent_key: Union[InterpolatedString, str], partition_field: Union[InterpolatedString, str], config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], extra_fields: Union[List[List[str]], List[List[InterpolatedString]], NoneType] = None, request_option: Optional[RequestOption] = None, incremental_dependency: bool = False, lazy_read_pointer: Optional[List[Union[InterpolatedString, str]]] = None)

stream: DeclarativeStream

parent_key: Union[InterpolatedString, str]

partition_field: Union[InterpolatedString, str]

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

extra_fields: Union[List[List[str]], List[List[InterpolatedString]], NoneType] = None

request_option: Optional[RequestOption] = None

incremental_dependency: bool = False

lazy_read_pointer: Optional[List[Union[InterpolatedString, str]]] = None

class ReadException(builtins.Exception): View Source

 7class ReadException(Exception):
 8    """
 9    Raise when there is an error reading data from an API Source
10    """

Raise when there is an error reading data from an API Source

@dataclass

class RecordExtractor: View Source

12@dataclass
13class RecordExtractor:
14    """
15    Responsible for translating an HTTP response into a list of records by extracting records from the response.
16    """
17
18    @abstractmethod
19    def extract_records(
20        self,
21        response: requests.Response,
22    ) -> Iterable[Mapping[str, Any]]:
23        """
24        Selects records from the response
25        :param response: The response to extract the records from
26        :return: List of Records extracted from the response
27        """
28        pass

Responsible for translating an HTTP response into a list of records by extracting records from the response.

@abstractmethod

def extract_records(self, response: requests.models.Response) -> Iterable[Mapping[str, Any]]: View Source

18    @abstractmethod
19    def extract_records(
20        self,
21        response: requests.Response,
22    ) -> Iterable[Mapping[str, Any]]:
23        """
24        Selects records from the response
25        :param response: The response to extract the records from
26        :return: List of Records extracted from the response
27        """
28        pass

Selects records from the response

Parameters

response: The response to extract the records from

Returns

List of Records extracted from the response

@dataclass

class RecordFilter: View Source

13@dataclass
14class RecordFilter:
15    """
16    Filter applied on a list of Records
17
18    config (Config): The user-provided configuration as specified by the source's spec
19    condition (str): The string representing the predicate to filter a record. Records will be removed if evaluated to False
20    """
21
22    parameters: InitVar[Mapping[str, Any]]
23    config: Config
24    condition: str = ""
25
26    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
27        self._filter_interpolator = InterpolatedBoolean(
28            condition=self.condition, parameters=parameters
29        )
30
31    def filter_records(
32        self,
33        records: Iterable[Mapping[str, Any]],
34        stream_state: StreamState,
35        stream_slice: Optional[StreamSlice] = None,
36        next_page_token: Optional[Mapping[str, Any]] = None,
37    ) -> Iterable[Mapping[str, Any]]:
38        kwargs = {
39            "stream_state": stream_state,
40            "stream_slice": stream_slice,
41            "next_page_token": next_page_token,
42            "stream_slice.extra_fields": stream_slice.extra_fields if stream_slice else {},
43        }
44        for record in records:
45            if self._filter_interpolator.eval(self.config, record=record, **kwargs):
46                yield record

Filter applied on a list of Records

config (Config): The user-provided configuration as specified by the source's spec condition (str): The string representing the predicate to filter a record. Records will be removed if evaluated to False

RecordFilter( parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], config: Mapping[str, Any], condition: str = '')

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

config: Mapping[str, Any]

condition: str = ''

def filter_records( self, records: Iterable[Mapping[str, Any]], stream_state: Mapping[str, Any], stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Iterable[Mapping[str, Any]]: View Source

31    def filter_records(
32        self,
33        records: Iterable[Mapping[str, Any]],
34        stream_state: StreamState,
35        stream_slice: Optional[StreamSlice] = None,
36        next_page_token: Optional[Mapping[str, Any]] = None,
37    ) -> Iterable[Mapping[str, Any]]:
38        kwargs = {
39            "stream_state": stream_state,
40            "stream_slice": stream_slice,
41            "next_page_token": next_page_token,
42            "stream_slice.extra_fields": stream_slice.extra_fields if stream_slice else {},
43        }
44        for record in records:
45            if self._filter_interpolator.eval(self.config, record=record, **kwargs):
46                yield record

@dataclass

class RecordSelector(airbyte_cdk.sources.declarative.extractors.http_selector.HttpSelector): View Source

 25@dataclass
 26class RecordSelector(HttpSelector):
 27    """
 28    Responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering
 29    records based on a heuristic.
 30
 31    Attributes:
 32        extractor (RecordExtractor): The record extractor responsible for extracting records from a response
 33        schema_normalization (TypeTransformer): The record normalizer responsible for casting record values to stream schema types
 34        record_filter (RecordFilter): The record filter responsible for filtering extracted records
 35        transformations (List[RecordTransformation]): The transformations to be done on the records
 36    """
 37
 38    extractor: RecordExtractor
 39    config: Config
 40    parameters: InitVar[Mapping[str, Any]]
 41    schema_normalization: Union[TypeTransformer, DeclarativeTypeTransformer]
 42    name: str
 43    _name: Union[InterpolatedString, str] = field(init=False, repr=False, default="")
 44    record_filter: Optional[RecordFilter] = None
 45    transformations: List[RecordTransformation] = field(default_factory=lambda: [])
 46    transform_before_filtering: bool = False
 47    file_uploader: Optional[DefaultFileUploader] = None
 48
 49    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 50        self._parameters = parameters
 51        self._name = (
 52            InterpolatedString(self._name, parameters=parameters)
 53            if isinstance(self._name, str)
 54            else self._name
 55        )
 56
 57    @property  # type: ignore
 58    def name(self) -> str:
 59        """
 60        :return: Stream name
 61        """
 62        return (
 63            str(self._name.eval(self.config))
 64            if isinstance(self._name, InterpolatedString)
 65            else self._name
 66        )
 67
 68    @name.setter
 69    def name(self, value: str) -> None:
 70        if not isinstance(value, property):
 71            self._name = value
 72
 73    def select_records(
 74        self,
 75        response: requests.Response,
 76        stream_state: StreamState,
 77        records_schema: Mapping[str, Any],
 78        stream_slice: Optional[StreamSlice] = None,
 79        next_page_token: Optional[Mapping[str, Any]] = None,
 80    ) -> Iterable[Record]:
 81        """
 82        Selects records from the response
 83        :param response: The response to select the records from
 84        :param stream_state: The stream state
 85        :param records_schema: json schema of records to return
 86        :param stream_slice: The stream slice
 87        :param next_page_token: The paginator token
 88        :return: List of Records selected from the response
 89        """
 90        all_data: Iterable[Mapping[str, Any]] = self.extractor.extract_records(response)
 91        yield from self.filter_and_transform(
 92            all_data, stream_state, records_schema, stream_slice, next_page_token
 93        )
 94
 95    def filter_and_transform(
 96        self,
 97        all_data: Iterable[Mapping[str, Any]],
 98        stream_state: StreamState,
 99        records_schema: Mapping[str, Any],
100        stream_slice: Optional[StreamSlice] = None,
101        next_page_token: Optional[Mapping[str, Any]] = None,
102    ) -> Iterable[Record]:
103        """
104        There is an issue with the selector as of 2024-08-30: it does technology-agnostic processing like filtering, transformation and
105        normalization with an API that is technology-specific (as requests.Response is only for HTTP communication using the requests
106        library).
107
108        Until we decide to move this logic away from the selector, we made this method public so that users like AsyncJobRetriever could
109        share the logic of doing transformations on a set of records.
110        """
111        if self.transform_before_filtering:
112            transformed_data = self._transform(all_data, stream_state, stream_slice)
113            transformed_filtered_data = self._filter(
114                transformed_data, stream_state, stream_slice, next_page_token
115            )
116        else:
117            filtered_data = self._filter(all_data, stream_state, stream_slice, next_page_token)
118            transformed_filtered_data = self._transform(filtered_data, stream_state, stream_slice)
119        normalized_data = self._normalize_by_schema(
120            transformed_filtered_data, schema=records_schema
121        )
122        for data in normalized_data:
123            record = Record(data=data, stream_name=self.name, associated_slice=stream_slice)
124            if self.file_uploader:
125                self.file_uploader.upload(record)
126            yield record
127
128    def _normalize_by_schema(
129        self, records: Iterable[Mapping[str, Any]], schema: Optional[Mapping[str, Any]]
130    ) -> Iterable[Mapping[str, Any]]:
131        if schema:
132            # record has type Mapping[str, Any], but dict[str, Any] expected
133            for record in records:
134                normalized_record = dict(record)
135                self.schema_normalization.transform(normalized_record, schema)
136                yield normalized_record
137        else:
138            yield from records
139
140    def _filter(
141        self,
142        records: Iterable[Mapping[str, Any]],
143        stream_state: StreamState,
144        stream_slice: Optional[StreamSlice],
145        next_page_token: Optional[Mapping[str, Any]],
146    ) -> Iterable[Mapping[str, Any]]:
147        if self.record_filter:
148            yield from self.record_filter.filter_records(
149                records,
150                stream_state=stream_state,
151                stream_slice=stream_slice,
152                next_page_token=next_page_token,
153            )
154        else:
155            yield from records
156
157    def _transform(
158        self,
159        records: Iterable[Mapping[str, Any]],
160        stream_state: StreamState,
161        stream_slice: Optional[StreamSlice] = None,
162    ) -> Iterable[Mapping[str, Any]]:
163        for record in records:
164            for transformation in self.transformations:
165                transformation.transform(
166                    record,  # type: ignore  # record has type Mapping[str, Any], but Dict[str, Any] expected
167                    config=self.config,
168                    stream_state=stream_state,
169                    stream_slice=stream_slice,
170                )
171            yield record

Responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering records based on a heuristic.

Attributes:

extractor (RecordExtractor): The record extractor responsible for extracting records from a response
schema_normalization (TypeTransformer): The record normalizer responsible for casting record values to stream schema types
record_filter (RecordFilter): The record filter responsible for filtering extracted records
transformations (List[RecordTransformation]): The transformations to be done on the records

RecordSelector( extractor: RecordExtractor, config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], schema_normalization: Union[TypeTransformer, airbyte_cdk.sources.declarative.extractors.TypeTransformer], name: str = <property object>, record_filter: Optional[RecordFilter] = None, transformations: List[RecordTransformation] = <factory>, transform_before_filtering: bool = False, file_uploader: Optional[airbyte_cdk.sources.declarative.retrievers.file_uploader.DefaultFileUploader] = None)

extractor: RecordExtractor

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

schema_normalization: Union[TypeTransformer, airbyte_cdk.sources.declarative.extractors.TypeTransformer]

57    @property  # type: ignore
58    def name(self) -> str:
59        """
60        :return: Stream name
61        """
62        return (
63            str(self._name.eval(self.config))
64            if isinstance(self._name, InterpolatedString)
65            else self._name
66        )

Returns

Stream name

record_filter: Optional[RecordFilter] = None

transformations: List[RecordTransformation]

transform_before_filtering: bool = False

file_uploader: Optional[airbyte_cdk.sources.declarative.retrievers.file_uploader.DefaultFileUploader] = None

def select_records( self, response: requests.models.Response, stream_state: Mapping[str, Any], records_schema: Mapping[str, Any], stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Iterable[Record]: View Source

73    def select_records(
74        self,
75        response: requests.Response,
76        stream_state: StreamState,
77        records_schema: Mapping[str, Any],
78        stream_slice: Optional[StreamSlice] = None,
79        next_page_token: Optional[Mapping[str, Any]] = None,
80    ) -> Iterable[Record]:
81        """
82        Selects records from the response
83        :param response: The response to select the records from
84        :param stream_state: The stream state
85        :param records_schema: json schema of records to return
86        :param stream_slice: The stream slice
87        :param next_page_token: The paginator token
88        :return: List of Records selected from the response
89        """
90        all_data: Iterable[Mapping[str, Any]] = self.extractor.extract_records(response)
91        yield from self.filter_and_transform(
92            all_data, stream_state, records_schema, stream_slice, next_page_token
93        )

Selects records from the response

Parameters

response: The response to select the records from
stream_state: The stream state
records_schema: json schema of records to return
stream_slice: The stream slice
next_page_token: The paginator token

Returns

List of Records selected from the response

def filter_and_transform( self, all_data: Iterable[Mapping[str, Any]], stream_state: Mapping[str, Any], records_schema: Mapping[str, Any], stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Iterable[Record]: View Source

 95    def filter_and_transform(
 96        self,
 97        all_data: Iterable[Mapping[str, Any]],
 98        stream_state: StreamState,
 99        records_schema: Mapping[str, Any],
100        stream_slice: Optional[StreamSlice] = None,
101        next_page_token: Optional[Mapping[str, Any]] = None,
102    ) -> Iterable[Record]:
103        """
104        There is an issue with the selector as of 2024-08-30: it does technology-agnostic processing like filtering, transformation and
105        normalization with an API that is technology-specific (as requests.Response is only for HTTP communication using the requests
106        library).
107
108        Until we decide to move this logic away from the selector, we made this method public so that users like AsyncJobRetriever could
109        share the logic of doing transformations on a set of records.
110        """
111        if self.transform_before_filtering:
112            transformed_data = self._transform(all_data, stream_state, stream_slice)
113            transformed_filtered_data = self._filter(
114                transformed_data, stream_state, stream_slice, next_page_token
115            )
116        else:
117            filtered_data = self._filter(all_data, stream_state, stream_slice, next_page_token)
118            transformed_filtered_data = self._transform(filtered_data, stream_state, stream_slice)
119        normalized_data = self._normalize_by_schema(
120            transformed_filtered_data, schema=records_schema
121        )
122        for data in normalized_data:
123            record = Record(data=data, stream_name=self.name, associated_slice=stream_slice)
124            if self.file_uploader:
125                self.file_uploader.upload(record)
126            yield record

There is an issue with the selector as of 2024-08-30: it does technology-agnostic processing like filtering, transformation and normalization with an API that is technology-specific (as requests.Response is only for HTTP communication using the requests library).

Until we decide to move this logic away from the selector, we made this method public so that users like AsyncJobRetriever could share the logic of doing transformations on a set of records.

@dataclass

class RecordTransformation: View Source

13@dataclass
14class RecordTransformation:
15    """
16    Implementations of this class define transformations that can be applied to records of a stream.
17    """
18
19    @abstractmethod
20    def transform(
21        self,
22        record: Dict[str, Any],
23        config: Optional[Config] = None,
24        stream_state: Optional[StreamState] = None,
25        stream_slice: Optional[StreamSlice] = None,
26    ) -> None:
27        """
28        Transform a record by adding, deleting, or mutating fields directly from the record reference passed in argument.
29
30        :param record: The input record to be transformed
31        :param config: The user-provided configuration as specified by the source's spec
32        :param stream_state: The stream state
33        :param stream_slice: The stream slice
34        :return: The transformed record
35        """
36
37    def __eq__(self, other: object) -> bool:
38        return other.__dict__ == self.__dict__

Implementations of this class define transformations that can be applied to records of a stream.

@abstractmethod

19    @abstractmethod
20    def transform(
21        self,
22        record: Dict[str, Any],
23        config: Optional[Config] = None,
24        stream_state: Optional[StreamState] = None,
25        stream_slice: Optional[StreamSlice] = None,
26    ) -> None:
27        """
28        Transform a record by adding, deleting, or mutating fields directly from the record reference passed in argument.
29
30        :param record: The input record to be transformed
31        :param config: The user-provided configuration as specified by the source's spec
32        :param stream_state: The stream state
33        :param stream_slice: The stream slice
34        :return: The transformed record
35        """

Transform a record by adding, deleting, or mutating fields directly from the record reference passed in argument.

Parameters

record: The input record to be transformed
config: The user-provided configuration as specified by the source's spec
stream_state: The stream state
stream_slice: The stream slice

Returns

The transformed record

@dataclass

class RequestOption: View Source

 25@dataclass
 26class RequestOption:
 27    """
 28    Describes an option to set on a request
 29
 30    Attributes:
 31        field_name (str): Describes the name of the parameter to inject. Mutually exclusive with field_path.
 32        field_path (list(str)): Describes the path to a nested field as a list of field names.
 33          Only valid for body_json injection type, and mutually exclusive with field_name.
 34        inject_into (RequestOptionType): Describes where in the HTTP request to inject the parameter
 35    """
 36
 37    inject_into: RequestOptionType
 38    parameters: InitVar[Mapping[str, Any]]
 39    field_name: Optional[Union[InterpolatedString, str]] = None
 40    field_path: Optional[List[Union[InterpolatedString, str]]] = None
 41
 42    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 43        # Validate inputs. We should expect either field_name or field_path, but not both
 44        if self.field_name is None and self.field_path is None:
 45            raise ValueError("RequestOption requires either a field_name or field_path")
 46
 47        if self.field_name is not None and self.field_path is not None:
 48            raise ValueError(
 49                "Only one of field_name or field_path can be provided to RequestOption"
 50            )
 51
 52        # Nested field injection is only supported for body JSON injection
 53        if self.field_path is not None and self.inject_into != RequestOptionType.body_json:
 54            raise ValueError(
 55                "Nested field injection is only supported for body JSON injection. Please use a top-level field_name for other injection types."
 56            )
 57
 58        # Convert field_name and field_path into InterpolatedString objects if they are strings
 59        if self.field_name is not None:
 60            self.field_name = InterpolatedString.create(self.field_name, parameters=parameters)
 61        elif self.field_path is not None:
 62            self.field_path = [
 63                InterpolatedString.create(segment, parameters=parameters)
 64                for segment in self.field_path
 65            ]
 66
 67    @property
 68    def _is_field_path(self) -> bool:
 69        """Returns whether this option is a field path (ie, a nested field)"""
 70        return self.field_path is not None
 71
 72    def inject_into_request(
 73        self,
 74        target: MutableMapping[str, Any],
 75        value: Any,
 76        config: Config,
 77    ) -> None:
 78        """
 79        Inject a request option value into a target request structure using either field_name or field_path.
 80        For non-body-json injection, only top-level field names are supported.
 81        For body-json injection, both field names and nested field paths are supported.
 82
 83        Args:
 84            target: The request structure to inject the value into
 85            value: The value to inject
 86            config: The config object to use for interpolation
 87        """
 88        if self._is_field_path:
 89            if self.inject_into != RequestOptionType.body_json:
 90                raise ValueError(
 91                    "Nested field injection is only supported for body JSON injection. Please use a top-level field_name for other injection types."
 92                )
 93
 94            assert self.field_path is not None  # for type checker
 95            current = target
 96            # Convert path segments into strings, evaluating any interpolated segments
 97            # Example: ["data", "{{ config[user_type] }}", "id"] -> ["data", "admin", "id"]
 98            *path_parts, final_key = [
 99                str(
100                    segment.eval(config=config)
101                    if isinstance(segment, InterpolatedString)
102                    else segment
103                )
104                for segment in self.field_path
105            ]
106
107            # Build a nested dictionary structure and set the final value at the deepest level
108            for part in path_parts:
109                current = current.setdefault(part, {})
110            current[final_key] = value
111        else:
112            # For non-nested fields, evaluate the field name if it's an interpolated string
113            key = (
114                self.field_name.eval(config=config)
115                if isinstance(self.field_name, InterpolatedString)
116                else self.field_name
117            )
118            target[str(key)] = value

Describes an option to set on a request

Attributes:

field_name (str): Describes the name of the parameter to inject. Mutually exclusive with field_path.
field_path (list(str)): Describes the path to a nested field as a list of field names. Only valid for body_json injection type, and mutually exclusive with field_name.
inject_into (RequestOptionType): Describes where in the HTTP request to inject the parameter

RequestOption( inject_into: RequestOptionType, parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], field_name: Union[InterpolatedString, str, NoneType] = None, field_path: Optional[List[Union[InterpolatedString, str]]] = None)

inject_into: RequestOptionType

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

field_name: Union[InterpolatedString, str, NoneType] = None

field_path: Optional[List[Union[InterpolatedString, str]]] = None

def inject_into_request( self, target: MutableMapping[str, Any], value: Any, config: Mapping[str, Any]) -> None: View Source

 72    def inject_into_request(
 73        self,
 74        target: MutableMapping[str, Any],
 75        value: Any,
 76        config: Config,
 77    ) -> None:
 78        """
 79        Inject a request option value into a target request structure using either field_name or field_path.
 80        For non-body-json injection, only top-level field names are supported.
 81        For body-json injection, both field names and nested field paths are supported.
 82
 83        Args:
 84            target: The request structure to inject the value into
 85            value: The value to inject
 86            config: The config object to use for interpolation
 87        """
 88        if self._is_field_path:
 89            if self.inject_into != RequestOptionType.body_json:
 90                raise ValueError(
 91                    "Nested field injection is only supported for body JSON injection. Please use a top-level field_name for other injection types."
 92                )
 93
 94            assert self.field_path is not None  # for type checker
 95            current = target
 96            # Convert path segments into strings, evaluating any interpolated segments
 97            # Example: ["data", "{{ config[user_type] }}", "id"] -> ["data", "admin", "id"]
 98            *path_parts, final_key = [
 99                str(
100                    segment.eval(config=config)
101                    if isinstance(segment, InterpolatedString)
102                    else segment
103                )
104                for segment in self.field_path
105            ]
106
107            # Build a nested dictionary structure and set the final value at the deepest level
108            for part in path_parts:
109                current = current.setdefault(part, {})
110            current[final_key] = value
111        else:
112            # For non-nested fields, evaluate the field name if it's an interpolated string
113            key = (
114                self.field_name.eval(config=config)
115                if isinstance(self.field_name, InterpolatedString)
116                else self.field_name
117            )
118            target[str(key)] = value

Inject a request option value into a target request structure using either field_name or field_path. For non-body-json injection, only top-level field names are supported. For body-json injection, both field names and nested field paths are supported.

Arguments:

target: The request structure to inject the value into
value: The value to inject
config: The config object to use for interpolation

class RequestOptionType(enum.Enum): View Source

14class RequestOptionType(Enum):
15    """
16    Describes where to set a value on a request
17    """
18
19    request_parameter = "request_parameter"
20    header = "header"
21    body_data = "body_data"
22    body_json = "body_json"

Describes where to set a value on a request

request_parameter = <RequestOptionType.request_parameter: 'request_parameter'>

body_data = <RequestOptionType.body_data: 'body_data'>

body_json = <RequestOptionType.body_json: 'body_json'>

ResponseStatus

@dataclass

class SimpleRetriever(airbyte_cdk.sources.declarative.retrievers.retriever.Retriever): View Source

 53@dataclass
 54class SimpleRetriever(Retriever):
 55    """
 56    Retrieves records by synchronously sending requests to fetch records.
 57
 58    The retriever acts as an orchestrator between the requester, the record selector, the paginator, and the stream slicer.
 59
 60    For each stream slice, submit requests until there are no more pages of records to fetch.
 61
 62    This retriever currently inherits from HttpStream to reuse the request submission and pagination machinery.
 63    As a result, some of the parameters passed to some methods are unused.
 64    The two will be decoupled in a future release.
 65
 66    Attributes:
 67        stream_name (str): The stream's name
 68        stream_primary_key (Optional[Union[str, List[str], List[List[str]]]]): The stream's primary key
 69        requester (Requester): The HTTP requester
 70        record_selector (HttpSelector): The record selector
 71        paginator (Optional[Paginator]): The paginator
 72        stream_slicer (Optional[StreamSlicer]): The stream slicer
 73        cursor (Optional[cursor]): The cursor
 74        parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation
 75    """
 76
 77    requester: Requester
 78    record_selector: HttpSelector
 79    config: Config
 80    parameters: InitVar[Mapping[str, Any]]
 81    name: str
 82    _name: Union[InterpolatedString, str] = field(init=False, repr=False, default="")
 83    primary_key: Optional[Union[str, List[str], List[List[str]]]]
 84    _primary_key: str = field(init=False, repr=False, default="")
 85    paginator: Optional[Paginator] = None
 86    stream_slicer: StreamSlicer = field(
 87        default_factory=lambda: SinglePartitionRouter(parameters={})
 88    )
 89    request_option_provider: RequestOptionsProvider = field(
 90        default_factory=lambda: DefaultRequestOptionsProvider(parameters={})
 91    )
 92    cursor: Optional[DeclarativeCursor] = None
 93    ignore_stream_slicer_parameters_on_paginated_requests: bool = False
 94    additional_query_properties: Optional[QueryProperties] = None
 95    log_formatter: Optional[Callable[[requests.Response], Any]] = None
 96
 97    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 98        self._paginator = self.paginator or NoPagination(parameters=parameters)
 99        self._parameters = parameters
100        self._name = (
101            InterpolatedString(self._name, parameters=parameters)
102            if isinstance(self._name, str)
103            else self._name
104        )
105
106    @property  # type: ignore
107    def name(self) -> str:
108        """
109        :return: Stream name
110        """
111        return (
112            str(self._name.eval(self.config))
113            if isinstance(self._name, InterpolatedString)
114            else self._name
115        )
116
117    @name.setter
118    def name(self, value: str) -> None:
119        if not isinstance(value, property):
120            self._name = value
121
122    def _get_mapping(
123        self, method: Callable[..., Optional[Union[Mapping[str, Any], str]]], **kwargs: Any
124    ) -> Tuple[Union[Mapping[str, Any], str], Set[str]]:
125        """
126        Get mapping from the provided method, and get the keys of the mapping.
127        If the method returns a string, it will return the string and an empty set.
128        If the method returns a dict, it will return the dict and its keys.
129        """
130        mapping = method(**kwargs) or {}
131        keys = set(mapping.keys()) if not isinstance(mapping, str) else set()
132        return mapping, keys
133
134    def _get_request_options(
135        self,
136        stream_state: Optional[StreamData],
137        stream_slice: Optional[StreamSlice],
138        next_page_token: Optional[Mapping[str, Any]],
139        paginator_method: Callable[..., Optional[Union[Mapping[str, Any], str]]],
140        stream_slicer_method: Callable[..., Optional[Union[Mapping[str, Any], str]]],
141    ) -> Union[Mapping[str, Any], str]:
142        """
143        Get the request_option from the paginator and the stream slicer.
144        Raise a ValueError if there's a key collision
145        Returned merged mapping otherwise
146        """
147        # FIXME we should eventually remove the usage of stream_state as part of the interpolation
148
149        is_body_json = paginator_method.__name__ == "get_request_body_json"
150
151        mappings = [
152            paginator_method(
153                stream_slice=stream_slice,
154                next_page_token=next_page_token,
155            ),
156        ]
157        if not next_page_token or not self.ignore_stream_slicer_parameters_on_paginated_requests:
158            mappings.append(
159                stream_slicer_method(
160                    stream_slice=stream_slice,
161                    next_page_token=next_page_token,
162                )
163            )
164        return combine_mappings(mappings, allow_same_value_merge=is_body_json)
165
166    def _request_headers(
167        self,
168        stream_state: Optional[StreamData] = None,
169        stream_slice: Optional[StreamSlice] = None,
170        next_page_token: Optional[Mapping[str, Any]] = None,
171    ) -> Mapping[str, Any]:
172        """
173        Specifies request headers.
174        Authentication headers will overwrite any overlapping headers returned from this method.
175        """
176        headers = self._get_request_options(
177            stream_state,
178            stream_slice,
179            next_page_token,
180            self._paginator.get_request_headers,
181            self.request_option_provider.get_request_headers,
182        )
183        if isinstance(headers, str):
184            raise ValueError("Request headers cannot be a string")
185        return {str(k): str(v) for k, v in headers.items()}
186
187    def _request_params(
188        self,
189        stream_state: Optional[StreamData] = None,
190        stream_slice: Optional[StreamSlice] = None,
191        next_page_token: Optional[Mapping[str, Any]] = None,
192    ) -> Mapping[str, Any]:
193        """
194        Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.
195
196        E.g: you might want to define query parameters for paging if next_page_token is not None.
197        """
198        params = self._get_request_options(
199            stream_state,
200            stream_slice,
201            next_page_token,
202            self._paginator.get_request_params,
203            self.request_option_provider.get_request_params,
204        )
205        if isinstance(params, str):
206            raise ValueError("Request params cannot be a string")
207        return params
208
209    def _request_body_data(
210        self,
211        stream_state: Optional[StreamData] = None,
212        stream_slice: Optional[StreamSlice] = None,
213        next_page_token: Optional[Mapping[str, Any]] = None,
214    ) -> Union[Mapping[str, Any], str]:
215        """
216        Specifies how to populate the body of the request with a non-JSON payload.
217
218        If returns a ready text that it will be sent as is.
219        If returns a dict that it will be converted to a urlencoded form.
220        E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"
221
222        At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.
223        """
224        return self._get_request_options(
225            stream_state,
226            stream_slice,
227            next_page_token,
228            self._paginator.get_request_body_data,
229            self.request_option_provider.get_request_body_data,
230        )
231
232    def _request_body_json(
233        self,
234        stream_state: Optional[StreamData] = None,
235        stream_slice: Optional[StreamSlice] = None,
236        next_page_token: Optional[Mapping[str, Any]] = None,
237    ) -> Optional[Mapping[str, Any]]:
238        """
239        Specifies how to populate the body of the request with a JSON payload.
240
241        At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.
242        """
243        body_json = self._get_request_options(
244            stream_state,
245            stream_slice,
246            next_page_token,
247            self._paginator.get_request_body_json,
248            self.request_option_provider.get_request_body_json,
249        )
250        if isinstance(body_json, str):
251            raise ValueError("Request body json cannot be a string")
252        return body_json
253
254    def _paginator_path(
255        self,
256        next_page_token: Optional[Mapping[str, Any]] = None,
257        stream_state: Optional[Mapping[str, Any]] = None,
258        stream_slice: Optional[StreamSlice] = None,
259    ) -> Optional[str]:
260        """
261        If the paginator points to a path, follow it, else return nothing so the requester is used.
262        :param next_page_token:
263        :return:
264        """
265        return self._paginator.path(
266            next_page_token=next_page_token,
267            stream_state=stream_state,
268            stream_slice=stream_slice,
269        )
270
271    def _parse_response(
272        self,
273        response: Optional[requests.Response],
274        stream_state: StreamState,
275        records_schema: Mapping[str, Any],
276        stream_slice: Optional[StreamSlice] = None,
277        next_page_token: Optional[Mapping[str, Any]] = None,
278    ) -> Iterable[Record]:
279        if not response:
280            yield from []
281        else:
282            yield from self.record_selector.select_records(
283                response=response,
284                stream_state=stream_state,
285                records_schema=records_schema,
286                stream_slice=stream_slice,
287                next_page_token=next_page_token,
288            )
289
290    @property  # type: ignore
291    def primary_key(self) -> Optional[Union[str, List[str], List[List[str]]]]:
292        """The stream's primary key"""
293        return self._primary_key
294
295    @primary_key.setter
296    def primary_key(self, value: str) -> None:
297        if not isinstance(value, property):
298            self._primary_key = value
299
300    def _next_page_token(
301        self,
302        response: requests.Response,
303        last_page_size: int,
304        last_record: Optional[Record],
305        last_page_token_value: Optional[Any],
306    ) -> Optional[Mapping[str, Any]]:
307        """
308        Specifies a pagination strategy.
309
310        The value returned from this method is passed to most other methods in this class. Use it to form a request e.g: set headers or query params.
311
312        :return: The token for the next page from the input response object. Returning None means there are no more pages to read in this response.
313        """
314        return self._paginator.next_page_token(
315            response=response,
316            last_page_size=last_page_size,
317            last_record=last_record,
318            last_page_token_value=last_page_token_value,
319        )
320
321    def _fetch_next_page(
322        self,
323        stream_state: Mapping[str, Any],
324        stream_slice: StreamSlice,
325        next_page_token: Optional[Mapping[str, Any]] = None,
326    ) -> Optional[requests.Response]:
327        return self.requester.send_request(
328            path=self._paginator_path(
329                next_page_token=next_page_token,
330                stream_state=stream_state,
331                stream_slice=stream_slice,
332            ),
333            stream_state=stream_state,
334            stream_slice=stream_slice,
335            next_page_token=next_page_token,
336            request_headers=self._request_headers(
337                stream_state=stream_state,
338                stream_slice=stream_slice,
339                next_page_token=next_page_token,
340            ),
341            request_params=self._request_params(
342                stream_state=stream_state,
343                stream_slice=stream_slice,
344                next_page_token=next_page_token,
345            ),
346            request_body_data=self._request_body_data(
347                stream_state=stream_state,
348                stream_slice=stream_slice,
349                next_page_token=next_page_token,
350            ),
351            request_body_json=self._request_body_json(
352                stream_state=stream_state,
353                stream_slice=stream_slice,
354                next_page_token=next_page_token,
355            ),
356            log_formatter=self.log_formatter,
357        )
358
359    # This logic is similar to _read_pages in the HttpStream class. When making changes here, consider making changes there as well.
360    def _read_pages(
361        self,
362        records_generator_fn: Callable[[Optional[requests.Response]], Iterable[Record]],
363        stream_state: Mapping[str, Any],
364        stream_slice: StreamSlice,
365    ) -> Iterable[Record]:
366        pagination_complete = False
367        initial_token = self._paginator.get_initial_token()
368        next_page_token: Optional[Mapping[str, Any]] = (
369            {"next_page_token": initial_token} if initial_token is not None else None
370        )
371        while not pagination_complete:
372            property_chunks: List[List[str]] = (
373                list(
374                    self.additional_query_properties.get_request_property_chunks(
375                        stream_slice=stream_slice
376                    )
377                )
378                if self.additional_query_properties
379                else [
380                    []
381                ]  # A single empty property chunk represents the case where property chunking is not configured
382            )
383
384            merged_records: MutableMapping[str, Any] = defaultdict(dict)
385            last_page_size = 0
386            last_record: Optional[Record] = None
387            response: Optional[requests.Response] = None
388            for properties in property_chunks:
389                if len(properties) > 0:
390                    stream_slice = StreamSlice(
391                        partition=stream_slice.partition or {},
392                        cursor_slice=stream_slice.cursor_slice or {},
393                        extra_fields={"query_properties": properties},
394                    )
395
396                response = self._fetch_next_page(stream_state, stream_slice, next_page_token)
397                for current_record in records_generator_fn(response):
398                    if (
399                        current_record
400                        and self.additional_query_properties
401                        and self.additional_query_properties.property_chunking
402                    ):
403                        merge_key = (
404                            self.additional_query_properties.property_chunking.get_merge_key(
405                                current_record
406                            )
407                        )
408                        if merge_key:
409                            _deep_merge(merged_records[merge_key], current_record)
410                        else:
411                            # We should still emit records even if the record did not have a merge key
412                            last_page_size += 1
413                            last_record = current_record
414                            yield current_record
415                    else:
416                        last_page_size += 1
417                        last_record = current_record
418                        yield current_record
419
420            if (
421                self.additional_query_properties
422                and self.additional_query_properties.property_chunking
423            ):
424                for merged_record in merged_records.values():
425                    record = Record(
426                        data=merged_record, stream_name=self.name, associated_slice=stream_slice
427                    )
428                    last_page_size += 1
429                    last_record = record
430                    yield record
431
432            if not response:
433                pagination_complete = True
434            else:
435                last_page_token_value = (
436                    next_page_token.get("next_page_token") if next_page_token else None
437                )
438                next_page_token = self._next_page_token(
439                    response=response,
440                    last_page_size=last_page_size,
441                    last_record=last_record,
442                    last_page_token_value=last_page_token_value,
443                )
444                if not next_page_token:
445                    pagination_complete = True
446
447        # Always return an empty generator just in case no records were ever yielded
448        yield from []
449
450    def _read_single_page(
451        self,
452        records_generator_fn: Callable[[Optional[requests.Response]], Iterable[Record]],
453        stream_state: Mapping[str, Any],
454        stream_slice: StreamSlice,
455    ) -> Iterable[StreamData]:
456        initial_token = stream_state.get("next_page_token")
457        if initial_token is None:
458            initial_token = self._paginator.get_initial_token()
459        next_page_token: Optional[Mapping[str, Any]] = (
460            {"next_page_token": initial_token} if initial_token else None
461        )
462
463        response = self._fetch_next_page(stream_state, stream_slice, next_page_token)
464
465        last_page_size = 0
466        last_record: Optional[Record] = None
467        for record in records_generator_fn(response):
468            last_page_size += 1
469            last_record = record
470            yield record
471
472        if not response:
473            next_page_token = {FULL_REFRESH_SYNC_COMPLETE_KEY: True}
474        else:
475            last_page_token_value = (
476                next_page_token.get("next_page_token") if next_page_token else None
477            )
478            next_page_token = self._next_page_token(
479                response=response,
480                last_page_size=last_page_size,
481                last_record=last_record,
482                last_page_token_value=last_page_token_value,
483            ) or {FULL_REFRESH_SYNC_COMPLETE_KEY: True}
484
485        if self.cursor:
486            self.cursor.close_slice(
487                StreamSlice(cursor_slice=next_page_token, partition=stream_slice.partition)
488            )
489
490        # Always return an empty generator just in case no records were ever yielded
491        yield from []
492
493    def read_records(
494        self,
495        records_schema: Mapping[str, Any],
496        stream_slice: Optional[StreamSlice] = None,
497    ) -> Iterable[StreamData]:
498        """
499        Fetch a stream's records from an HTTP API source
500
501        :param records_schema: json schema to describe record
502        :param stream_slice: The stream slice to read data for
503        :return: The records read from the API source
504        """
505        _slice = stream_slice or StreamSlice(partition={}, cursor_slice={})  # None-check
506
507        most_recent_record_from_slice = None
508        record_generator = partial(
509            self._parse_records,
510            stream_slice=stream_slice,
511            stream_state=self.state or {},
512            records_schema=records_schema,
513        )
514
515        if self.cursor and isinstance(self.cursor, ResumableFullRefreshCursor):
516            stream_state = self.state
517
518            # Before syncing the RFR stream, we check if the job's prior attempt was successful and don't need to
519            # fetch more records. The platform deletes stream state for full refresh streams before starting a
520            # new job, so we don't need to worry about this value existing for the initial attempt
521            if stream_state.get(FULL_REFRESH_SYNC_COMPLETE_KEY):
522                return
523
524            yield from self._read_single_page(record_generator, stream_state, _slice)
525        else:
526            for stream_data in self._read_pages(record_generator, self.state, _slice):
527                current_record = self._extract_record(stream_data, _slice)
528                if self.cursor and current_record:
529                    self.cursor.observe(_slice, current_record)
530
531                yield stream_data
532
533            if self.cursor:
534                self.cursor.close_slice(_slice)
535        return
536
537    # FIXME based on the comment above in SimpleRetriever.read_records, it seems like we can tackle https://github.com/airbytehq/airbyte-internal-issues/issues/6955 and remove this
538
539    def _extract_record(
540        self, stream_data: StreamData, stream_slice: StreamSlice
541    ) -> Optional[Record]:
542        """
543        As we allow the output of _read_pages to be StreamData, it can be multiple things. Therefore, we need to filter out and normalize
544        to data to streamline the rest of the process.
545        """
546        if isinstance(stream_data, Record):
547            # Record is not part of `StreamData` but is the most common implementation of `Mapping[str, Any]` which is part of `StreamData`
548            return stream_data
549        elif isinstance(stream_data, (dict, Mapping)):
550            return Record(
551                data=dict(stream_data), associated_slice=stream_slice, stream_name=self.name
552            )
553        elif isinstance(stream_data, AirbyteMessage) and stream_data.record:
554            return Record(
555                data=stream_data.record.data,  # type:ignore # AirbyteMessage always has record.data
556                associated_slice=stream_slice,
557                stream_name=self.name,
558            )
559        return None
560
561    # stream_slices is defined with arguments on http stream and fixing this has a long tail of dependencies. Will be resolved by the decoupling of http stream and simple retriever
562    def stream_slices(self) -> Iterable[Optional[StreamSlice]]:  # type: ignore
563        """
564        Specifies the slices for this stream. See the stream slicing section of the docs for more information.
565
566        :param sync_mode:
567        :param cursor_field:
568        :param stream_state:
569        :return:
570        """
571        return self.stream_slicer.stream_slices()
572
573    @property
574    def state(self) -> Mapping[str, Any]:
575        return self.cursor.get_stream_state() if self.cursor else {}
576
577    @state.setter
578    def state(self, value: StreamState) -> None:
579        """State setter, accept state serialized by state getter."""
580        if self.cursor:
581            self.cursor.set_initial_state(value)
582
583    def _parse_records(
584        self,
585        response: Optional[requests.Response],
586        stream_state: Mapping[str, Any],
587        records_schema: Mapping[str, Any],
588        stream_slice: Optional[StreamSlice],
589    ) -> Iterable[Record]:
590        yield from self._parse_response(
591            response,
592            stream_slice=stream_slice,
593            stream_state=stream_state,
594            records_schema=records_schema,
595        )
596
597    def must_deduplicate_query_params(self) -> bool:
598        return True
599
600    @staticmethod
601    def _to_partition_key(to_serialize: Any) -> str:
602        # separators have changed in Python 3.4. To avoid being impacted by further change, we explicitly specify our own value
603        return json.dumps(to_serialize, indent=None, separators=(",", ":"), sort_keys=True)

Retrieves records by synchronously sending requests to fetch records.

The retriever acts as an orchestrator between the requester, the record selector, the paginator, and the stream slicer.

For each stream slice, submit requests until there are no more pages of records to fetch.

This retriever currently inherits from HttpStream to reuse the request submission and pagination machinery. As a result, some of the parameters passed to some methods are unused. The two will be decoupled in a future release.

Attributes:

stream_name (str): The stream's name
stream_primary_key (Optional[Union[str, List[str], List[List[str]]]]): The stream's primary key
requester (Requester): The HTTP requester
record_selector (HttpSelector): The record selector
paginator (Optional[Paginator]): The paginator
stream_slicer (Optional[StreamSlicer]): The stream slicer
cursor (Optional[cursor]): The cursor
parameters (Mapping[str, Any]): Additional runtime parameters to be used for string interpolation

SimpleRetriever( requester: Requester, record_selector: airbyte_cdk.sources.declarative.extractors.HttpSelector, config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]], name: str = <property object>, primary_key: Union[str, List[str], List[List[str]], NoneType] = <property object>, paginator: Optional[airbyte_cdk.sources.declarative.requesters.paginators.Paginator] = None, stream_slicer: airbyte_cdk.sources.declarative.stream_slicers.StreamSlicer = <factory>, request_option_provider: airbyte_cdk.sources.declarative.requesters.request_options.RequestOptionsProvider = <factory>, cursor: Optional[airbyte_cdk.sources.declarative.incremental.DeclarativeCursor] = None, ignore_stream_slicer_parameters_on_paginated_requests: bool = False, additional_query_properties: Optional[airbyte_cdk.sources.declarative.requesters.query_properties.QueryProperties] = None, log_formatter: Optional[Callable[[requests.models.Response], Any]] = None)

requester: Requester

record_selector: airbyte_cdk.sources.declarative.extractors.HttpSelector

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

106    @property  # type: ignore
107    def name(self) -> str:
108        """
109        :return: Stream name
110        """
111        return (
112            str(self._name.eval(self.config))
113            if isinstance(self._name, InterpolatedString)
114            else self._name
115        )

Returns

Stream name

primary_key: Union[str, List[str], List[List[str]], NoneType] View Source

290    @property  # type: ignore
291    def primary_key(self) -> Optional[Union[str, List[str], List[List[str]]]]:
292        """The stream's primary key"""
293        return self._primary_key

The stream's primary key

paginator: Optional[airbyte_cdk.sources.declarative.requesters.paginators.Paginator] = None

stream_slicer: airbyte_cdk.sources.declarative.stream_slicers.StreamSlicer

request_option_provider: airbyte_cdk.sources.declarative.requesters.request_options.RequestOptionsProvider

cursor: Optional[airbyte_cdk.sources.declarative.incremental.DeclarativeCursor] = None

ignore_stream_slicer_parameters_on_paginated_requests: bool = False

additional_query_properties: Optional[airbyte_cdk.sources.declarative.requesters.query_properties.QueryProperties] = None

log_formatter: Optional[Callable[[requests.models.Response], Any]] = None

def read_records( self, records_schema: Mapping[str, Any], stream_slice: Optional[StreamSlice] = None) -> Iterable[Union[Mapping[str, Any], AirbyteMessage]]: View Source

493    def read_records(
494        self,
495        records_schema: Mapping[str, Any],
496        stream_slice: Optional[StreamSlice] = None,
497    ) -> Iterable[StreamData]:
498        """
499        Fetch a stream's records from an HTTP API source
500
501        :param records_schema: json schema to describe record
502        :param stream_slice: The stream slice to read data for
503        :return: The records read from the API source
504        """
505        _slice = stream_slice or StreamSlice(partition={}, cursor_slice={})  # None-check
506
507        most_recent_record_from_slice = None
508        record_generator = partial(
509            self._parse_records,
510            stream_slice=stream_slice,
511            stream_state=self.state or {},
512            records_schema=records_schema,
513        )
514
515        if self.cursor and isinstance(self.cursor, ResumableFullRefreshCursor):
516            stream_state = self.state
517
518            # Before syncing the RFR stream, we check if the job's prior attempt was successful and don't need to
519            # fetch more records. The platform deletes stream state for full refresh streams before starting a
520            # new job, so we don't need to worry about this value existing for the initial attempt
521            if stream_state.get(FULL_REFRESH_SYNC_COMPLETE_KEY):
522                return
523
524            yield from self._read_single_page(record_generator, stream_state, _slice)
525        else:
526            for stream_data in self._read_pages(record_generator, self.state, _slice):
527                current_record = self._extract_record(stream_data, _slice)
528                if self.cursor and current_record:
529                    self.cursor.observe(_slice, current_record)
530
531                yield stream_data
532
533            if self.cursor:
534                self.cursor.close_slice(_slice)
535        return

Fetch a stream's records from an HTTP API source

Parameters

records_schema: json schema to describe record
stream_slice: The stream slice to read data for

Returns

The records read from the API source

def stream_slices(self) -> Iterable[Optional[StreamSlice]]: View Source

562    def stream_slices(self) -> Iterable[Optional[StreamSlice]]:  # type: ignore
563        """
564        Specifies the slices for this stream. See the stream slicing section of the docs for more information.
565
566        :param sync_mode:
567        :param cursor_field:
568        :param stream_state:
569        :return:
570        """
571        return self.stream_slicer.stream_slices()

Specifies the slices for this stream. See the stream slicing section of the docs for more information.

Parameters

sync_mode:
cursor_field:
stream_state:

Returns

state: Mapping[str, Any] View Source

573    @property
574    def state(self) -> Mapping[str, Any]:
575        return self.cursor.get_stream_state() if self.cursor else {}

State getter, should return state in form that can serialized to a string and send to the output as a STATE AirbyteMessage.

A good example of a state is a cursor_value: { self.cursor_field: "cursor_value" }

State should try to be as small as possible but at the same time descriptive enough to restore syncing process from the point where it stopped.

def must_deduplicate_query_params(self) -> bool: View Source

597    def must_deduplicate_query_params(self) -> bool:
598        return True

@dataclass

class SinglePartitionRouter(airbyte_cdk.sources.declarative.partition_routers.partition_router.PartitionRouter): View Source

13@dataclass
14class SinglePartitionRouter(PartitionRouter):
15    """Partition router returning only a stream slice"""
16
17    parameters: InitVar[Mapping[str, Any]]
18
19    def get_request_params(
20        self,
21        stream_state: Optional[StreamState] = None,
22        stream_slice: Optional[StreamSlice] = None,
23        next_page_token: Optional[Mapping[str, Any]] = None,
24    ) -> Mapping[str, Any]:
25        return {}
26
27    def get_request_headers(
28        self,
29        stream_state: Optional[StreamState] = None,
30        stream_slice: Optional[StreamSlice] = None,
31        next_page_token: Optional[Mapping[str, Any]] = None,
32    ) -> Mapping[str, Any]:
33        return {}
34
35    def get_request_body_data(
36        self,
37        stream_state: Optional[StreamState] = None,
38        stream_slice: Optional[StreamSlice] = None,
39        next_page_token: Optional[Mapping[str, Any]] = None,
40    ) -> Mapping[str, Any]:
41        return {}
42
43    def get_request_body_json(
44        self,
45        stream_state: Optional[StreamState] = None,
46        stream_slice: Optional[StreamSlice] = None,
47        next_page_token: Optional[Mapping[str, Any]] = None,
48    ) -> Mapping[str, Any]:
49        return {}
50
51    def stream_slices(self) -> Iterable[StreamSlice]:
52        yield StreamSlice(partition={}, cursor_slice={})
53
54    def set_initial_state(self, stream_state: StreamState) -> None:
55        """
56        SinglePartitionRouter doesn't have parent streams
57        """
58        pass
59
60    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
61        """
62        SinglePartitionRouter doesn't have parent streams
63        """
64        pass

Partition router returning only a stream slice

SinglePartitionRouter(parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

def get_request_params( self, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

19    def get_request_params(
20        self,
21        stream_state: Optional[StreamState] = None,
22        stream_slice: Optional[StreamSlice] = None,
23        next_page_token: Optional[Mapping[str, Any]] = None,
24    ) -> Mapping[str, Any]:
25        return {}

Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.

E.g: you might want to define query parameters for paging if next_page_token is not None.

def get_request_headers( self, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

27    def get_request_headers(
28        self,
29        stream_state: Optional[StreamState] = None,
30        stream_slice: Optional[StreamSlice] = None,
31        next_page_token: Optional[Mapping[str, Any]] = None,
32    ) -> Mapping[str, Any]:
33        return {}

Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.

def get_request_body_data( self, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

35    def get_request_body_data(
36        self,
37        stream_state: Optional[StreamState] = None,
38        stream_slice: Optional[StreamSlice] = None,
39        next_page_token: Optional[Mapping[str, Any]] = None,
40    ) -> Mapping[str, Any]:
41        return {}

Specifies how to populate the body of the request with a non-JSON payload.

If returns a ready text that it will be sent as is. If returns a dict that it will be converted to a urlencoded form. E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

def get_request_body_json( self, stream_state: Optional[Mapping[str, Any]] = None, stream_slice: Optional[StreamSlice] = None, next_page_token: Optional[Mapping[str, Any]] = None) -> Mapping[str, Any]: View Source

43    def get_request_body_json(
44        self,
45        stream_state: Optional[StreamState] = None,
46        stream_slice: Optional[StreamSlice] = None,
47        next_page_token: Optional[Mapping[str, Any]] = None,
48    ) -> Mapping[str, Any]:
49        return {}

Specifies how to populate the body of the request with a JSON payload.

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

def stream_slices(self) -> Iterable[StreamSlice]: View Source

51    def stream_slices(self) -> Iterable[StreamSlice]:
52        yield StreamSlice(partition={}, cursor_slice={})

Defines stream slices

Returns

An iterable of stream slices

def set_initial_state(self, stream_state: Mapping[str, Any]) -> None: View Source

54    def set_initial_state(self, stream_state: StreamState) -> None:
55        """
56        SinglePartitionRouter doesn't have parent streams
57        """
58        pass

SinglePartitionRouter doesn't have parent streams

def get_stream_state(self) -> Optional[Mapping[str, Mapping[str, Any]]]: View Source

60    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
61        """
62        SinglePartitionRouter doesn't have parent streams
63        """
64        pass

SinglePartitionRouter doesn't have parent streams

class StopConditionPaginationStrategyDecorator(airbyte_cdk.PaginationStrategy): View Source

40class StopConditionPaginationStrategyDecorator(PaginationStrategy):
41    def __init__(self, _delegate: PaginationStrategy, stop_condition: PaginationStopCondition):
42        self._delegate = _delegate
43        self._stop_condition = stop_condition
44
45    def next_page_token(
46        self,
47        response: requests.Response,
48        last_page_size: int,
49        last_record: Optional[Record],
50        last_page_token_value: Optional[Any] = None,
51    ) -> Optional[Any]:
52        # We evaluate in reverse order because the assumption is that most of the APIs using data feed structure
53        # will return records in descending order. In terms of performance/memory, we return the records lazily
54        if last_record and self._stop_condition.is_met(last_record):
55            return None
56        return self._delegate.next_page_token(
57            response, last_page_size, last_record, last_page_token_value
58        )
59
60    def get_page_size(self) -> Optional[int]:
61        return self._delegate.get_page_size()
62
63    @property
64    def initial_token(self) -> Optional[Any]:
65        return self._delegate.initial_token

Defines how to get the next page token

StopConditionPaginationStrategyDecorator( _delegate: PaginationStrategy, stop_condition: airbyte_cdk.sources.declarative.requesters.paginators.strategies.stop_condition.PaginationStopCondition) View Source

41    def __init__(self, _delegate: PaginationStrategy, stop_condition: PaginationStopCondition):
42        self._delegate = _delegate
43        self._stop_condition = stop_condition

def next_page_token( self, response: requests.models.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any] = None) -> Optional[Any]: View Source

45    def next_page_token(
46        self,
47        response: requests.Response,
48        last_page_size: int,
49        last_record: Optional[Record],
50        last_page_token_value: Optional[Any] = None,
51    ) -> Optional[Any]:
52        # We evaluate in reverse order because the assumption is that most of the APIs using data feed structure
53        # will return records in descending order. In terms of performance/memory, we return the records lazily
54        if last_record and self._stop_condition.is_met(last_record):
55            return None
56        return self._delegate.next_page_token(
57            response, last_page_size, last_record, last_page_token_value
58        )

Parameters

response: response to process
last_page_size: the number of records read from the response
last_record: the last record extracted from the response
last_page_token_value: The current value of the page token made on the last request

Returns

next page token. Returns None if there are no more pages to fetch

def get_page_size(self) -> Optional[int]: View Source

60    def get_page_size(self) -> Optional[int]:
61        return self._delegate.get_page_size()

Returns

page size: The number of records to fetch in a page. Returns None if unspecified

initial_token: Optional[Any] View Source

63    @property
64    def initial_token(self) -> Optional[Any]:
65        return self._delegate.initial_token

Return the initial value of the token

class StreamSlice(typing.Mapping[str, typing.Any]): View Source

 76class StreamSlice(Mapping[str, Any]):
 77    def __init__(
 78        self,
 79        *,
 80        partition: Mapping[str, Any],
 81        cursor_slice: Mapping[str, Any],
 82        extra_fields: Optional[Mapping[str, Any]] = None,
 83    ) -> None:
 84        """
 85        :param partition: The partition keys representing a unique partition in the stream.
 86        :param cursor_slice: The incremental cursor slice keys, such as dates or pagination tokens.
 87        :param extra_fields: Additional fields that should not be part of the partition but passed along, such as metadata from the parent stream.
 88        """
 89        self._partition = partition
 90        self._cursor_slice = cursor_slice
 91        self._extra_fields = extra_fields or {}
 92
 93        # Ensure that partition keys do not overlap with cursor slice keys
 94        if partition.keys() & cursor_slice.keys():
 95            raise ValueError("Keys for partition and incremental sync cursor should not overlap")
 96
 97        self._stream_slice = dict(partition) | dict(cursor_slice)
 98
 99    @property
100    def partition(self) -> Mapping[str, Any]:
101        """Returns the partition portion of the stream slice."""
102        p = self._partition
103        while isinstance(p, StreamSlice):
104            p = p.partition
105        return p
106
107    @property
108    def cursor_slice(self) -> Mapping[str, Any]:
109        """Returns the cursor slice portion of the stream slice."""
110        c = self._cursor_slice
111        while isinstance(c, StreamSlice):
112            c = c.cursor_slice
113        return c
114
115    @property
116    def extra_fields(self) -> Mapping[str, Any]:
117        """Returns the extra fields that are not part of the partition."""
118        return self._extra_fields
119
120    def __repr__(self) -> str:
121        return repr(self._stream_slice)
122
123    def __setitem__(self, key: str, value: Any) -> None:
124        raise ValueError("StreamSlice is immutable")
125
126    def __getitem__(self, key: str) -> Any:
127        return self._stream_slice[key]
128
129    def __len__(self) -> int:
130        return len(self._stream_slice)
131
132    def __iter__(self) -> Iterator[str]:
133        return iter(self._stream_slice)
134
135    def __contains__(self, item: Any) -> bool:
136        return item in self._stream_slice
137
138    def keys(self) -> KeysView[str]:
139        return self._stream_slice.keys()
140
141    def items(self) -> ItemsView[str, Any]:
142        return self._stream_slice.items()
143
144    def values(self) -> ValuesView[Any]:
145        return self._stream_slice.values()
146
147    def get(self, key: str, default: Any = None) -> Optional[Any]:
148        return self._stream_slice.get(key, default)
149
150    def __eq__(self, other: Any) -> bool:
151        if isinstance(other, dict):
152            return self._stream_slice == other
153        if isinstance(other, StreamSlice):
154            # noinspection PyProtectedMember
155            return self._partition == other._partition and self._cursor_slice == other._cursor_slice
156        return False
157
158    def __ne__(self, other: Any) -> bool:
159        return not self.__eq__(other)
160
161    def __json_serializable__(self) -> Any:
162        return self._stream_slice
163
164    def __hash__(self) -> int:
165        return SliceHasher.hash(
166            stream_slice=self._stream_slice
167        )  # no need to provide stream_name here as this is used for slicing the cursor
168
169    def __bool__(self) -> bool:
170        return bool(self._stream_slice) or bool(self._extra_fields)

A Mapping is a generic container for associating key/value pairs.

This class provides concrete generic implementations of all methods except for __getitem__, __iter__, and __len__.

StreamSlice( *, partition: Mapping[str, Any], cursor_slice: Mapping[str, Any], extra_fields: Optional[Mapping[str, Any]] = None) View Source

77    def __init__(
78        self,
79        *,
80        partition: Mapping[str, Any],
81        cursor_slice: Mapping[str, Any],
82        extra_fields: Optional[Mapping[str, Any]] = None,
83    ) -> None:
84        """
85        :param partition: The partition keys representing a unique partition in the stream.
86        :param cursor_slice: The incremental cursor slice keys, such as dates or pagination tokens.
87        :param extra_fields: Additional fields that should not be part of the partition but passed along, such as metadata from the parent stream.
88        """
89        self._partition = partition
90        self._cursor_slice = cursor_slice
91        self._extra_fields = extra_fields or {}
92
93        # Ensure that partition keys do not overlap with cursor slice keys
94        if partition.keys() & cursor_slice.keys():
95            raise ValueError("Keys for partition and incremental sync cursor should not overlap")
96
97        self._stream_slice = dict(partition) | dict(cursor_slice)

Parameters

partition: The partition keys representing a unique partition in the stream.
cursor_slice: The incremental cursor slice keys, such as dates or pagination tokens.
extra_fields: Additional fields that should not be part of the partition but passed along, such as metadata from the parent stream.

partition: Mapping[str, Any] View Source

 99    @property
100    def partition(self) -> Mapping[str, Any]:
101        """Returns the partition portion of the stream slice."""
102        p = self._partition
103        while isinstance(p, StreamSlice):
104            p = p.partition
105        return p

Returns the partition portion of the stream slice.

cursor_slice: Mapping[str, Any] View Source

107    @property
108    def cursor_slice(self) -> Mapping[str, Any]:
109        """Returns the cursor slice portion of the stream slice."""
110        c = self._cursor_slice
111        while isinstance(c, StreamSlice):
112            c = c.cursor_slice
113        return c

Returns the cursor slice portion of the stream slice.

extra_fields: Mapping[str, Any] View Source

115    @property
116    def extra_fields(self) -> Mapping[str, Any]:
117        """Returns the extra fields that are not part of the partition."""
118        return self._extra_fields

Returns the extra fields that are not part of the partition.

def keys(self) -> KeysView[str]: View Source

138    def keys(self) -> KeysView[str]:
139        return self._stream_slice.keys()

D.keys() -> a set-like object providing a view on D's keys

def items(self) -> ItemsView[str, Any]: View Source

141    def items(self) -> ItemsView[str, Any]:
142        return self._stream_slice.items()

D.items() -> a set-like object providing a view on D's items

def values(self) -> ValuesView[Any]: View Source

144    def values(self) -> ValuesView[Any]:
145        return self._stream_slice.values()

D.values() -> an object providing a view on D's values

def get(self, key: str, default: Any = None) -> Optional[Any]: View Source

147    def get(self, key: str, default: Any = None) -> Optional[Any]:
148        return self._stream_slice.get(key, default)

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

@dataclass

class SubstreamPartitionRouter(airbyte_cdk.sources.declarative.partition_routers.partition_router.PartitionRouter): View Source

 80@dataclass
 81class SubstreamPartitionRouter(PartitionRouter):
 82    """
 83    Partition router that iterates over the parent's stream records and emits slices
 84    Will populate the state with `partition_field` and `parent_slice` so they can be accessed by other components
 85
 86    Attributes:
 87        parent_stream_configs (List[ParentStreamConfig]): parent streams to iterate over and their config
 88    """
 89
 90    parent_stream_configs: List[ParentStreamConfig]
 91    config: Config
 92    parameters: InitVar[Mapping[str, Any]]
 93
 94    def __post_init__(self, parameters: Mapping[str, Any]) -> None:
 95        if not self.parent_stream_configs:
 96            raise ValueError("SubstreamPartitionRouter needs at least 1 parent stream")
 97        self._parameters = parameters
 98
 99    def get_request_params(
100        self,
101        stream_state: Optional[StreamState] = None,
102        stream_slice: Optional[StreamSlice] = None,
103        next_page_token: Optional[Mapping[str, Any]] = None,
104    ) -> Mapping[str, Any]:
105        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
106        return self._get_request_option(RequestOptionType.request_parameter, stream_slice)
107
108    def get_request_headers(
109        self,
110        stream_state: Optional[StreamState] = None,
111        stream_slice: Optional[StreamSlice] = None,
112        next_page_token: Optional[Mapping[str, Any]] = None,
113    ) -> Mapping[str, Any]:
114        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
115        return self._get_request_option(RequestOptionType.header, stream_slice)
116
117    def get_request_body_data(
118        self,
119        stream_state: Optional[StreamState] = None,
120        stream_slice: Optional[StreamSlice] = None,
121        next_page_token: Optional[Mapping[str, Any]] = None,
122    ) -> Mapping[str, Any]:
123        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
124        return self._get_request_option(RequestOptionType.body_data, stream_slice)
125
126    def get_request_body_json(
127        self,
128        stream_state: Optional[StreamState] = None,
129        stream_slice: Optional[StreamSlice] = None,
130        next_page_token: Optional[Mapping[str, Any]] = None,
131    ) -> Mapping[str, Any]:
132        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
133        return self._get_request_option(RequestOptionType.body_json, stream_slice)
134
135    def _get_request_option(
136        self, option_type: RequestOptionType, stream_slice: Optional[StreamSlice]
137    ) -> Mapping[str, Any]:
138        params: MutableMapping[str, Any] = {}
139        if stream_slice:
140            for parent_config in self.parent_stream_configs:
141                if (
142                    parent_config.request_option
143                    and parent_config.request_option.inject_into == option_type
144                ):
145                    key = parent_config.partition_field.eval(self.config)  # type: ignore # partition_field is always casted to an interpolated string
146                    value = stream_slice.get(key)
147                    if value:
148                        parent_config.request_option.inject_into_request(params, value, self.config)
149        return params
150
151    def stream_slices(self) -> Iterable[StreamSlice]:
152        """
153        Iterate over each parent stream's record and create a StreamSlice for each record.
154
155        For each stream, iterate over its stream_slices.
156        For each stream slice, iterate over each record.
157        yield a stream slice for each such records.
158
159        If a parent slice contains no record, emit a slice with parent_record=None.
160
161        The template string can interpolate the following values:
162        - parent_stream_slice: mapping representing the parent's stream slice
163        - parent_record: mapping representing the parent record
164        - parent_stream_name: string representing the parent stream name
165        """
166        if not self.parent_stream_configs:
167            yield from []
168        else:
169            for parent_stream_config in self.parent_stream_configs:
170                parent_stream = parent_stream_config.stream
171                parent_field = parent_stream_config.parent_key.eval(self.config)  # type: ignore # parent_key is always casted to an interpolated string
172                partition_field = parent_stream_config.partition_field.eval(self.config)  # type: ignore # partition_field is always casted to an interpolated string
173                extra_fields = None
174                if parent_stream_config.extra_fields:
175                    extra_fields = [
176                        [field_path_part.eval(self.config) for field_path_part in field_path]  # type: ignore [union-attr]
177                        for field_path in parent_stream_config.extra_fields
178                    ]
179
180                # read_stateless() assumes the parent is not concurrent. This is currently okay since the concurrent CDK does
181                # not support either substreams or RFR, but something that needs to be considered once we do
182                for parent_record in parent_stream.read_only_records():
183                    parent_partition = None
184                    # Skip non-records (eg AirbyteLogMessage)
185                    if isinstance(parent_record, AirbyteMessage):
186                        self.logger.warning(
187                            f"Parent stream {parent_stream.name} returns records of type AirbyteMessage. This SubstreamPartitionRouter is not able to checkpoint incremental parent state."
188                        )
189                        if parent_record.type == MessageType.RECORD:
190                            parent_record = parent_record.record.data  # type: ignore[union-attr, assignment]  # record is always a Record
191                        else:
192                            continue
193                    elif isinstance(parent_record, Record):
194                        parent_partition = (
195                            parent_record.associated_slice.partition
196                            if parent_record.associated_slice
197                            else {}
198                        )
199                        parent_record = parent_record.data
200                    elif not isinstance(parent_record, Mapping):
201                        # The parent_record should only take the form of a Record, AirbyteMessage, or Mapping. Anything else is invalid
202                        raise AirbyteTracedException(
203                            message=f"Parent stream returned records as invalid type {type(parent_record)}"
204                        )
205                    try:
206                        partition_value = dpath.get(
207                            parent_record,  # type: ignore [arg-type]
208                            parent_field,
209                        )
210                    except KeyError:
211                        continue
212
213                    # Add extra fields
214                    extracted_extra_fields = self._extract_extra_fields(parent_record, extra_fields)
215
216                    if parent_stream_config.lazy_read_pointer:
217                        extracted_extra_fields = {
218                            "child_response": self._extract_child_response(
219                                parent_record,
220                                parent_stream_config.lazy_read_pointer,  # type: ignore[arg-type]  # lazy_read_pointer type handeled in __post_init__ of parent_stream_config
221                            ),
222                            **extracted_extra_fields,
223                        }
224
225                    yield StreamSlice(
226                        partition={
227                            partition_field: partition_value,
228                            "parent_slice": parent_partition or {},
229                        },
230                        cursor_slice={},
231                        extra_fields=extracted_extra_fields,
232                    )
233
234    def _extract_child_response(
235        self, parent_record: Mapping[str, Any] | AirbyteMessage, pointer: List[InterpolatedString]
236    ) -> requests.Response:
237        """Extract child records from a parent record based on lazy pointers."""
238
239        def _create_response(data: MutableMapping[str, Any]) -> SafeResponse:
240            """Create a SafeResponse with the given data."""
241            response = SafeResponse()
242            response.content = json.dumps(data).encode("utf-8")
243            response.status_code = 200
244            return response
245
246        path = [path.eval(self.config) for path in pointer]
247        return _create_response(dpath.get(parent_record, path, default=[]))  # type: ignore # argunet will be a MutableMapping, given input data structure
248
249    def _extract_extra_fields(
250        self,
251        parent_record: Mapping[str, Any] | AirbyteMessage,
252        extra_fields: Optional[List[List[str]]] = None,
253    ) -> Mapping[str, Any]:
254        """
255        Extracts additional fields specified by their paths from the parent record.
256
257        Args:
258            parent_record (Mapping[str, Any]): The record from the parent stream to extract fields from.
259            extra_fields (Optional[List[List[str]]]): A list of field paths (as lists of strings) to extract from the parent record.
260
261        Returns:
262            Mapping[str, Any]: A dictionary containing the extracted fields.
263                               The keys are the joined field paths, and the values are the corresponding extracted values.
264        """
265        extracted_extra_fields = {}
266        if extra_fields:
267            for extra_field_path in extra_fields:
268                try:
269                    extra_field_value = dpath.get(
270                        parent_record,  # type: ignore [arg-type]
271                        extra_field_path,
272                    )
273                    self.logger.debug(
274                        f"Extracted extra_field_path: {extra_field_path} with value: {extra_field_value}"
275                    )
276                except KeyError:
277                    self.logger.debug(f"Failed to extract extra_field_path: {extra_field_path}")
278                    extra_field_value = None
279                extracted_extra_fields[".".join(extra_field_path)] = extra_field_value
280        return extracted_extra_fields
281
282    def set_initial_state(self, stream_state: StreamState) -> None:
283        """
284        Set the state of the parent streams.
285
286        If the `parent_state` key is missing from `stream_state`, migrate the child stream state to the parent stream's state format.
287        This migration applies only to parent streams with incremental dependencies.
288
289        Args:
290            stream_state (StreamState): The state of the streams to be set.
291
292        Example of state format:
293        {
294            "parent_state": {
295                "parent_stream_name1": {
296                    "last_updated": "2023-05-27T00:00:00Z"
297                },
298                "parent_stream_name2": {
299                    "last_updated": "2023-05-27T00:00:00Z"
300                }
301            }
302        }
303
304        Example of migrating to parent state format:
305        - Initial state:
306        {
307            "updated_at": "2023-05-27T00:00:00Z"
308        }
309        - After migration:
310        {
311            "updated_at": "2023-05-27T00:00:00Z",
312            "parent_state": {
313                "parent_stream_name": {
314                    "parent_stream_cursor": "2023-05-27T00:00:00Z"
315                }
316            }
317        }
318        """
319        if not stream_state:
320            return
321
322        parent_state = stream_state.get("parent_state", {})
323
324        # Set state for each parent stream with an incremental dependency
325        for parent_config in self.parent_stream_configs:
326            if (
327                not parent_state.get(parent_config.stream.name, {})
328                and parent_config.incremental_dependency
329            ):
330                # Migrate child state to parent state format
331                parent_state = self._migrate_child_state_to_parent_state(stream_state)
332
333            if parent_config.incremental_dependency:
334                parent_config.stream.state = parent_state.get(parent_config.stream.name, {})
335
336    def _migrate_child_state_to_parent_state(self, stream_state: StreamState) -> StreamState:
337        """
338        Migrate the child or global stream state into the parent stream's state format.
339
340        This method converts the child stream state—or, if present, the global state—into a format that is
341        compatible with parent streams that use incremental synchronization. The migration occurs only for
342        parent streams with incremental dependencies. It filters out per-partition states and retains only the
343        global state in the form {cursor_field: cursor_value}.
344
345        The method supports multiple input formats:
346          - A simple global state, e.g.:
347                {"updated_at": "2023-05-27T00:00:00Z"}
348          - A state object that contains a "state" key (which is assumed to hold the global state), e.g.:
349                {"state": {"updated_at": "2023-05-27T00:00:00Z"}, ...}
350            In this case, the migration uses the first value from the "state" dictionary.
351          - Any per-partition state formats or other non-simple structures are ignored during migration.
352
353        Args:
354            stream_state (StreamState): The state to migrate. Expected formats include:
355                - {"updated_at": "2023-05-27T00:00:00Z"}
356                - {"state": {"updated_at": "2023-05-27T00:00:00Z"}, ...}
357                  (In this format, only the first global state value is used, and per-partition states are ignored.)
358
359        Returns:
360            StreamState: A migrated state for parent streams in the format:
361                {
362                    "parent_stream_name": {"parent_stream_cursor": "2023-05-27T00:00:00Z"}
363                }
364            where each parent stream with an incremental dependency is assigned its corresponding cursor value.
365
366        Example:
367            Input: {"updated_at": "2023-05-27T00:00:00Z"}
368            Output: {
369                "parent_stream_name": {"parent_stream_cursor": "2023-05-27T00:00:00Z"}
370            }
371        """
372        substream_state_values = list(stream_state.values())
373        substream_state = substream_state_values[0] if substream_state_values else {}
374
375        # Ignore per-partition states or invalid formats.
376        if isinstance(substream_state, (list, dict)) or len(substream_state_values) != 1:
377            # If a global state is present under the key "state", use its first value.
378            if (
379                "state" in stream_state
380                and isinstance(stream_state["state"], dict)
381                and stream_state["state"] != {}
382            ):
383                substream_state = list(stream_state["state"].values())[0]
384            else:
385                return {}
386
387        # Build the parent state for all parent streams with incremental dependencies.
388        parent_state = {}
389        if substream_state:
390            for parent_config in self.parent_stream_configs:
391                if parent_config.incremental_dependency:
392                    parent_state[parent_config.stream.name] = {
393                        parent_config.stream.cursor_field: substream_state
394                    }
395
396        return parent_state
397
398    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
399        """
400        Get the state of the parent streams.
401
402        Returns:
403            StreamState: The current state of the parent streams.
404
405        Example of state format:
406        {
407            "parent_stream_name1": {
408                "last_updated": "2023-05-27T00:00:00Z"
409            },
410            "parent_stream_name2": {
411                "last_updated": "2023-05-27T00:00:00Z"
412            }
413        }
414        """
415        parent_state = {}
416        for parent_config in self.parent_stream_configs:
417            if parent_config.incremental_dependency:
418                parent_state[parent_config.stream.name] = copy.deepcopy(parent_config.stream.state)
419        return parent_state
420
421    @property
422    def logger(self) -> logging.Logger:
423        return logging.getLogger("airbyte.SubstreamPartitionRouter")

Partition router that iterates over the parent's stream records and emits slices Will populate the state with partition_field and parent_slice so they can be accessed by other components

Attributes:

parent_stream_configs (List[ParentStreamConfig]): parent streams to iterate over and their config

SubstreamPartitionRouter( parent_stream_configs: List[ParentStreamConfig], config: Mapping[str, Any], parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]])

parent_stream_configs: List[ParentStreamConfig]

config: Mapping[str, Any]

parameters: dataclasses.InitVar[typing.Mapping[str, typing.Any]]

 99    def get_request_params(
100        self,
101        stream_state: Optional[StreamState] = None,
102        stream_slice: Optional[StreamSlice] = None,
103        next_page_token: Optional[Mapping[str, Any]] = None,
104    ) -> Mapping[str, Any]:
105        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
106        return self._get_request_option(RequestOptionType.request_parameter, stream_slice)

Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.

E.g: you might want to define query parameters for paging if next_page_token is not None.

108    def get_request_headers(
109        self,
110        stream_state: Optional[StreamState] = None,
111        stream_slice: Optional[StreamSlice] = None,
112        next_page_token: Optional[Mapping[str, Any]] = None,
113    ) -> Mapping[str, Any]:
114        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
115        return self._get_request_option(RequestOptionType.header, stream_slice)

Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.

117    def get_request_body_data(
118        self,
119        stream_state: Optional[StreamState] = None,
120        stream_slice: Optional[StreamSlice] = None,
121        next_page_token: Optional[Mapping[str, Any]] = None,
122    ) -> Mapping[str, Any]:
123        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
124        return self._get_request_option(RequestOptionType.body_data, stream_slice)

Specifies how to populate the body of the request with a non-JSON payload.

If returns a ready text that it will be sent as is. If returns a dict that it will be converted to a urlencoded form. E.g. {"key1": "value1", "key2": "value2"} => "key1=value1&key2=value2"

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

126    def get_request_body_json(
127        self,
128        stream_state: Optional[StreamState] = None,
129        stream_slice: Optional[StreamSlice] = None,
130        next_page_token: Optional[Mapping[str, Any]] = None,
131    ) -> Mapping[str, Any]:
132        # Pass the stream_slice from the argument, not the cursor because the cursor is updated after processing the response
133        return self._get_request_option(RequestOptionType.body_json, stream_slice)

Specifies how to populate the body of the request with a JSON payload.

At the same time only one of the 'request_body_data' and 'request_body_json' functions can be overridden.

def stream_slices(self) -> Iterable[StreamSlice]: View Source

151    def stream_slices(self) -> Iterable[StreamSlice]:
152        """
153        Iterate over each parent stream's record and create a StreamSlice for each record.
154
155        For each stream, iterate over its stream_slices.
156        For each stream slice, iterate over each record.
157        yield a stream slice for each such records.
158
159        If a parent slice contains no record, emit a slice with parent_record=None.
160
161        The template string can interpolate the following values:
162        - parent_stream_slice: mapping representing the parent's stream slice
163        - parent_record: mapping representing the parent record
164        - parent_stream_name: string representing the parent stream name
165        """
166        if not self.parent_stream_configs:
167            yield from []
168        else:
169            for parent_stream_config in self.parent_stream_configs:
170                parent_stream = parent_stream_config.stream
171                parent_field = parent_stream_config.parent_key.eval(self.config)  # type: ignore # parent_key is always casted to an interpolated string
172                partition_field = parent_stream_config.partition_field.eval(self.config)  # type: ignore # partition_field is always casted to an interpolated string
173                extra_fields = None
174                if parent_stream_config.extra_fields:
175                    extra_fields = [
176                        [field_path_part.eval(self.config) for field_path_part in field_path]  # type: ignore [union-attr]
177                        for field_path in parent_stream_config.extra_fields
178                    ]
179
180                # read_stateless() assumes the parent is not concurrent. This is currently okay since the concurrent CDK does
181                # not support either substreams or RFR, but something that needs to be considered once we do
182                for parent_record in parent_stream.read_only_records():
183                    parent_partition = None
184                    # Skip non-records (eg AirbyteLogMessage)
185                    if isinstance(parent_record, AirbyteMessage):
186                        self.logger.warning(
187                            f"Parent stream {parent_stream.name} returns records of type AirbyteMessage. This SubstreamPartitionRouter is not able to checkpoint incremental parent state."
188                        )
189                        if parent_record.type == MessageType.RECORD:
190                            parent_record = parent_record.record.data  # type: ignore[union-attr, assignment]  # record is always a Record
191                        else:
192                            continue
193                    elif isinstance(parent_record, Record):
194                        parent_partition = (
195                            parent_record.associated_slice.partition
196                            if parent_record.associated_slice
197                            else {}
198                        )
199                        parent_record = parent_record.data
200                    elif not isinstance(parent_record, Mapping):
201                        # The parent_record should only take the form of a Record, AirbyteMessage, or Mapping. Anything else is invalid
202                        raise AirbyteTracedException(
203                            message=f"Parent stream returned records as invalid type {type(parent_record)}"
204                        )
205                    try:
206                        partition_value = dpath.get(
207                            parent_record,  # type: ignore [arg-type]
208                            parent_field,
209                        )
210                    except KeyError:
211                        continue
212
213                    # Add extra fields
214                    extracted_extra_fields = self._extract_extra_fields(parent_record, extra_fields)
215
216                    if parent_stream_config.lazy_read_pointer:
217                        extracted_extra_fields = {
218                            "child_response": self._extract_child_response(
219                                parent_record,
220                                parent_stream_config.lazy_read_pointer,  # type: ignore[arg-type]  # lazy_read_pointer type handeled in __post_init__ of parent_stream_config
221                            ),
222                            **extracted_extra_fields,
223                        }
224
225                    yield StreamSlice(
226                        partition={
227                            partition_field: partition_value,
228                            "parent_slice": parent_partition or {},
229                        },
230                        cursor_slice={},
231                        extra_fields=extracted_extra_fields,
232                    )

Iterate over each parent stream's record and create a StreamSlice for each record.

For each stream, iterate over its stream_slices. For each stream slice, iterate over each record. yield a stream slice for each such records.

If a parent slice contains no record, emit a slice with parent_record=None.

The template string can interpolate the following values:

parent_stream_slice: mapping representing the parent's stream slice
parent_record: mapping representing the parent record
parent_stream_name: string representing the parent stream name

def set_initial_state(self, stream_state: Mapping[str, Any]) -> None: View Source

282    def set_initial_state(self, stream_state: StreamState) -> None:
283        """
284        Set the state of the parent streams.
285
286        If the `parent_state` key is missing from `stream_state`, migrate the child stream state to the parent stream's state format.
287        This migration applies only to parent streams with incremental dependencies.
288
289        Args:
290            stream_state (StreamState): The state of the streams to be set.
291
292        Example of state format:
293        {
294            "parent_state": {
295                "parent_stream_name1": {
296                    "last_updated": "2023-05-27T00:00:00Z"
297                },
298                "parent_stream_name2": {
299                    "last_updated": "2023-05-27T00:00:00Z"
300                }
301            }
302        }
303
304        Example of migrating to parent state format:
305        - Initial state:
306        {
307            "updated_at": "2023-05-27T00:00:00Z"
308        }
309        - After migration:
310        {
311            "updated_at": "2023-05-27T00:00:00Z",
312            "parent_state": {
313                "parent_stream_name": {
314                    "parent_stream_cursor": "2023-05-27T00:00:00Z"
315                }
316            }
317        }
318        """
319        if not stream_state:
320            return
321
322        parent_state = stream_state.get("parent_state", {})
323
324        # Set state for each parent stream with an incremental dependency
325        for parent_config in self.parent_stream_configs:
326            if (
327                not parent_state.get(parent_config.stream.name, {})
328                and parent_config.incremental_dependency
329            ):
330                # Migrate child state to parent state format
331                parent_state = self._migrate_child_state_to_parent_state(stream_state)
332
333            if parent_config.incremental_dependency:
334                parent_config.stream.state = parent_state.get(parent_config.stream.name, {})

Set the state of the parent streams.

If the parent_state key is missing from stream_state, migrate the child stream state to the parent stream's state format. This migration applies only to parent streams with incremental dependencies.

Arguments:

stream_state (StreamState): The state of the streams to be set.

Example of state format: { "parent_state": { "parent_stream_name1": { "last_updated": "2023-05-27T00:00:00Z" }, "parent_stream_name2": { "last_updated": "2023-05-27T00:00:00Z" } } }

Example of migrating to parent state format:

Initial state: { "updated_at": "2023-05-27T00:00:00Z" }
After migration: { "updated_at": "2023-05-27T00:00:00Z", "parent_state": { "parent_stream_name": { "parent_stream_cursor": "2023-05-27T00:00:00Z" } } }

def get_stream_state(self) -> Optional[Mapping[str, Mapping[str, Any]]]: View Source

398    def get_stream_state(self) -> Optional[Mapping[str, StreamState]]:
399        """
400        Get the state of the parent streams.
401
402        Returns:
403            StreamState: The current state of the parent streams.
404
405        Example of state format:
406        {
407            "parent_stream_name1": {
408                "last_updated": "2023-05-27T00:00:00Z"
409            },
410            "parent_stream_name2": {
411                "last_updated": "2023-05-27T00:00:00Z"
412            }
413        }
414        """
415        parent_state = {}
416        for parent_config in self.parent_stream_configs:
417            if parent_config.incremental_dependency:
418                parent_state[parent_config.stream.name] = copy.deepcopy(parent_config.stream.state)
419        return parent_state

Get the state of the parent streams.

Returns:

StreamState: The current state of the parent streams.

Example of state format: { "parent_stream_name1": { "last_updated": "2023-05-27T00:00:00Z" }, "parent_stream_name2": { "last_updated": "2023-05-27T00:00:00Z" } }

logger: logging.Logger View Source

421    @property
422    def logger(self) -> logging.Logger:
423        return logging.getLogger("airbyte.SubstreamPartitionRouter")

class YamlDeclarativeSource(airbyte_cdk.sources.declarative.concurrent_declarative_source.ConcurrentDeclarativeSource[typing.List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage]]): View Source

18class YamlDeclarativeSource(ConcurrentDeclarativeSource[List[AirbyteStateMessage]]):
19    """Declarative source defined by a yaml file"""
20
21    def __init__(
22        self,
23        path_to_yaml: str,
24        debug: bool = False,
25        catalog: Optional[ConfiguredAirbyteCatalog] = None,
26        config: Optional[Mapping[str, Any]] = None,
27        state: Optional[List[AirbyteStateMessage]] = None,
28        config_path: Optional[str] = None,
29    ) -> None:
30        """
31        :param path_to_yaml: Path to the yaml file describing the source
32        """
33        self._path_to_yaml = path_to_yaml
34        source_config = self._read_and_parse_yaml_file(path_to_yaml)
35
36        super().__init__(
37            catalog=catalog or ConfiguredAirbyteCatalog(streams=[]),
38            config=config or {},
39            state=state or [],
40            source_config=source_config,
41            config_path=config_path,
42        )
43
44    def _read_and_parse_yaml_file(self, path_to_yaml_file: str) -> ConnectionDefinition:
45        try:
46            # For testing purposes, we want to allow to just pass a file
47            with open(path_to_yaml_file, "r") as f:
48                return yaml.safe_load(f)  # type: ignore  # we assume the yaml represents a ConnectionDefinition
49        except FileNotFoundError:
50            # Running inside the container, the working directory during an operation is not structured the same as the static files
51            package = self.__class__.__module__.split(".")[0]
52
53            yaml_config = pkgutil.get_data(package, path_to_yaml_file)
54            if yaml_config:
55                decoded_yaml = yaml_config.decode()
56                return self._parse(decoded_yaml)
57            return {}
58
59    def _emit_manifest_debug_message(self, extra_args: dict[str, Any]) -> None:
60        extra_args["path_to_yaml"] = self._path_to_yaml
61
62    @staticmethod
63    def _parse(connection_definition_str: str) -> ConnectionDefinition:
64        """
65        Parses a yaml file into a manifest. Component references still exist in the manifest which will be
66        resolved during the creating of the DeclarativeSource.
67        :param connection_definition_str: yaml string to parse
68        :return: The ConnectionDefinition parsed from connection_definition_str
69        """
70        return yaml.safe_load(connection_definition_str)  # type: ignore # yaml.safe_load doesn't return a type but know it is a Mapping

Declarative source defined by a yaml file

YamlDeclarativeSource( path_to_yaml: str, debug: bool = False, catalog: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteCatalog] = None, config: Optional[Mapping[str, Any]] = None, state: Optional[List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage]] = None, config_path: Optional[str] = None) View Source

21    def __init__(
22        self,
23        path_to_yaml: str,
24        debug: bool = False,
25        catalog: Optional[ConfiguredAirbyteCatalog] = None,
26        config: Optional[Mapping[str, Any]] = None,
27        state: Optional[List[AirbyteStateMessage]] = None,
28        config_path: Optional[str] = None,
29    ) -> None:
30        """
31        :param path_to_yaml: Path to the yaml file describing the source
32        """
33        self._path_to_yaml = path_to_yaml
34        source_config = self._read_and_parse_yaml_file(path_to_yaml)
35
36        super().__init__(
37            catalog=catalog or ConfiguredAirbyteCatalog(streams=[]),
38            config=config or {},
39            state=state or [],
40            source_config=source_config,
41            config_path=config_path,
42        )

Parameters

path_to_yaml: Path to the yaml file describing the source

Inherited Members

airbyte_cdk.sources.declarative.concurrent_declarative_source.ConcurrentDeclarativeSource: is_partially_declarative; read; discover; streams
ManifestDeclarativeSource: logger; components_module; resolved_manifest; configure; message_repository; dynamic_streams; deprecation_warnings; connection_checker; spec; check
airbyte_cdk.sources.declarative.declarative_source.DeclarativeSource: check_connection
AbstractSource: raise_exception_on_missing_stream; stop_sync_on_stream_failure
Source: read_state; read_catalog; name
BaseConnector: check_config_against_spec; read_config; write_config

def launch(source: Source, args: List[str]) -> None: View Source

372def launch(source: Source, args: List[str]) -> None:
373    source_entrypoint = AirbyteEntrypoint(source)
374    parsed_args = source_entrypoint.parse_args(args)
375    # temporarily removes the PrintBuffer because we're seeing weird print behavior for concurrent syncs
376    # Refer to: https://github.com/airbytehq/oncall/issues/6235
377    with PRINT_BUFFER:
378        for message in source_entrypoint.run(parsed_args):
379            # simply printing is creating issues for concurrent CDK as Python uses different two instructions to print: one for the message and
380            # the other for the break line. Adding `\n` to the message ensure that both are printed at the same time
381            print(f"{message}\n", end="")

class AirbyteEntrypoint: View Source

 54class AirbyteEntrypoint(object):
 55    def __init__(self, source: Source):
 56        init_uncaught_exception_handler(logger)
 57
 58        # Deployment mode is read when instantiating the entrypoint because it is the common path shared by syncs and connector builder test requests
 59        if is_cloud_environment():
 60            _init_internal_request_filter()
 61
 62        self.source = source
 63        self.logger = logging.getLogger(f"airbyte.{getattr(source, 'name', '')}")
 64
 65    @staticmethod
 66    def parse_args(args: List[str]) -> argparse.Namespace:
 67        # set up parent parsers
 68        parent_parser = argparse.ArgumentParser(add_help=False)
 69        parent_parser.add_argument(
 70            "--debug", action="store_true", help="enables detailed debug logs related to the sync"
 71        )
 72        main_parser = argparse.ArgumentParser()
 73        subparsers = main_parser.add_subparsers(title="commands", dest="command")
 74
 75        # spec
 76        subparsers.add_parser(
 77            "spec", help="outputs the json configuration specification", parents=[parent_parser]
 78        )
 79
 80        # check
 81        check_parser = subparsers.add_parser(
 82            "check", help="checks the config can be used to connect", parents=[parent_parser]
 83        )
 84        required_check_parser = check_parser.add_argument_group("required named arguments")
 85        required_check_parser.add_argument(
 86            "--config", type=str, required=True, help="path to the json configuration file"
 87        )
 88        check_parser.add_argument(
 89            "--manifest-path",
 90            type=str,
 91            required=False,
 92            help="path to the YAML manifest file to inject into the config",
 93        )
 94        check_parser.add_argument(
 95            "--components-path",
 96            type=str,
 97            required=False,
 98            help="path to the custom components file, if it exists",
 99        )
100
101        # discover
102        discover_parser = subparsers.add_parser(
103            "discover",
104            help="outputs a catalog describing the source's schema",
105            parents=[parent_parser],
106        )
107        required_discover_parser = discover_parser.add_argument_group("required named arguments")
108        required_discover_parser.add_argument(
109            "--config", type=str, required=True, help="path to the json configuration file"
110        )
111        discover_parser.add_argument(
112            "--manifest-path",
113            type=str,
114            required=False,
115            help="path to the YAML manifest file to inject into the config",
116        )
117        discover_parser.add_argument(
118            "--components-path",
119            type=str,
120            required=False,
121            help="path to the custom components file, if it exists",
122        )
123
124        # read
125        read_parser = subparsers.add_parser(
126            "read", help="reads the source and outputs messages to STDOUT", parents=[parent_parser]
127        )
128
129        read_parser.add_argument(
130            "--state", type=str, required=False, help="path to the json-encoded state file"
131        )
132        required_read_parser = read_parser.add_argument_group("required named arguments")
133        required_read_parser.add_argument(
134            "--config", type=str, required=True, help="path to the json configuration file"
135        )
136        required_read_parser.add_argument(
137            "--catalog",
138            type=str,
139            required=True,
140            help="path to the catalog used to determine which data to read",
141        )
142        read_parser.add_argument(
143            "--manifest-path",
144            type=str,
145            required=False,
146            help="path to the YAML manifest file to inject into the config",
147        )
148        read_parser.add_argument(
149            "--components-path",
150            type=str,
151            required=False,
152            help="path to the custom components file, if it exists",
153        )
154
155        return main_parser.parse_args(args)
156
157    def run(self, parsed_args: argparse.Namespace) -> Iterable[str]:
158        cmd = parsed_args.command
159        if not cmd:
160            raise Exception("No command passed")
161
162        if hasattr(parsed_args, "debug") and parsed_args.debug:
163            self.logger.setLevel(logging.DEBUG)
164            logger.setLevel(logging.DEBUG)
165            self.logger.debug("Debug logs enabled")
166        else:
167            self.logger.setLevel(logging.INFO)
168
169        source_spec: ConnectorSpecification = self.source.spec(self.logger)
170        try:
171            with tempfile.TemporaryDirectory(
172                # Cleanup can fail on Windows due to file locks. Ignore if so,
173                # rather than failing the whole process.
174                ignore_cleanup_errors=True,
175            ) as temp_dir:
176                os.environ[ENV_REQUEST_CACHE_PATH] = (
177                    temp_dir  # set this as default directory for request_cache to store *.sqlite files
178                )
179                if cmd == "spec":
180                    message = AirbyteMessage(type=Type.SPEC, spec=source_spec)
181                    yield from [
182                        self.airbyte_message_to_string(queued_message)
183                        for queued_message in self._emit_queued_messages(self.source)
184                    ]
185                    yield self.airbyte_message_to_string(message)
186                else:
187                    raw_config = self.source.read_config(parsed_args.config)
188                    config = self.source.configure(raw_config, temp_dir)
189
190                    yield from [
191                        self.airbyte_message_to_string(queued_message)
192                        for queued_message in self._emit_queued_messages(self.source)
193                    ]
194                    if cmd == "check":
195                        yield from map(
196                            AirbyteEntrypoint.airbyte_message_to_string,
197                            self.check(source_spec, config),
198                        )
199                    elif cmd == "discover":
200                        yield from map(
201                            AirbyteEntrypoint.airbyte_message_to_string,
202                            self.discover(source_spec, config),
203                        )
204                    elif cmd == "read":
205                        config_catalog = self.source.read_catalog(parsed_args.catalog)
206                        state = self.source.read_state(parsed_args.state)
207
208                        yield from map(
209                            AirbyteEntrypoint.airbyte_message_to_string,
210                            self.read(source_spec, config, config_catalog, state),
211                        )
212                    else:
213                        raise Exception("Unexpected command " + cmd)
214        finally:
215            yield from [
216                self.airbyte_message_to_string(queued_message)
217                for queued_message in self._emit_queued_messages(self.source)
218            ]
219
220    def check(
221        self, source_spec: ConnectorSpecification, config: TConfig
222    ) -> Iterable[AirbyteMessage]:
223        self.set_up_secret_filter(config, source_spec.connectionSpecification)
224        try:
225            self.validate_connection(source_spec, config)
226        except AirbyteTracedException as traced_exc:
227            connection_status = traced_exc.as_connection_status_message()
228            # The platform uses the exit code to surface unexpected failures so we raise the exception if the failure type not a config error
229            # If the failure is not exceptional, we'll emit a failed connection status message and return
230            if traced_exc.failure_type != FailureType.config_error:
231                raise traced_exc
232            if connection_status:
233                yield from self._emit_queued_messages(self.source)
234                yield connection_status
235                return
236
237        try:
238            check_result = self.source.check(self.logger, config)
239        except AirbyteTracedException as traced_exc:
240            yield traced_exc.as_airbyte_message()
241            # The platform uses the exit code to surface unexpected failures so we raise the exception if the failure type not a config error
242            # If the failure is not exceptional, we'll emit a failed connection status message and return
243            if traced_exc.failure_type != FailureType.config_error:
244                raise traced_exc
245            else:
246                yield AirbyteMessage(
247                    type=Type.CONNECTION_STATUS,
248                    connectionStatus=AirbyteConnectionStatus(
249                        status=Status.FAILED, message=traced_exc.message
250                    ),
251                )
252                return
253        if check_result.status == Status.SUCCEEDED:
254            self.logger.info("Check succeeded")
255        else:
256            self.logger.error("Check failed")
257
258        yield from self._emit_queued_messages(self.source)
259        yield AirbyteMessage(type=Type.CONNECTION_STATUS, connectionStatus=check_result)
260
261    def discover(
262        self, source_spec: ConnectorSpecification, config: TConfig
263    ) -> Iterable[AirbyteMessage]:
264        self.set_up_secret_filter(config, source_spec.connectionSpecification)
265        if self.source.check_config_against_spec:
266            self.validate_connection(source_spec, config)
267        catalog = self.source.discover(self.logger, config)
268
269        yield from self._emit_queued_messages(self.source)
270        yield AirbyteMessage(type=Type.CATALOG, catalog=catalog)
271
272    def read(
273        self, source_spec: ConnectorSpecification, config: TConfig, catalog: Any, state: list[Any]
274    ) -> Iterable[AirbyteMessage]:
275        self.set_up_secret_filter(config, source_spec.connectionSpecification)
276        if self.source.check_config_against_spec:
277            self.validate_connection(source_spec, config)
278
279        # The Airbyte protocol dictates that counts be expressed as float/double to better protect against integer overflows
280        stream_message_counter: DefaultDict[HashableStreamDescriptor, float] = defaultdict(float)
281        for message in self.source.read(self.logger, config, catalog, state):
282            yield self.handle_record_counts(message, stream_message_counter)
283        for message in self._emit_queued_messages(self.source):
284            yield self.handle_record_counts(message, stream_message_counter)
285
286    @staticmethod
287    def handle_record_counts(
288        message: AirbyteMessage, stream_message_count: DefaultDict[HashableStreamDescriptor, float]
289    ) -> AirbyteMessage:
290        match message.type:
291            case Type.RECORD:
292                if message.record is None:
293                    raise ValueError("Record message must have a record attribute")
294
295                stream_message_count[
296                    HashableStreamDescriptor(
297                        name=message.record.stream,  # type: ignore[union-attr] # record has `stream`
298                        namespace=message.record.namespace,  # type: ignore[union-attr] # record has `namespace`
299                    )
300                ] += 1.0
301            case Type.STATE:
302                if message.state is None:
303                    raise ValueError("State message must have a state attribute")
304
305                stream_descriptor = message_utils.get_stream_descriptor(message)
306
307                # Set record count from the counter onto the state message
308                message.state.sourceStats = message.state.sourceStats or AirbyteStateStats()  # type: ignore[union-attr] # state has `sourceStats`
309                message.state.sourceStats.recordCount = stream_message_count.get(  # type: ignore[union-attr] # state has `sourceStats`
310                    stream_descriptor, 0.0
311                )
312
313                # Reset the counter
314                stream_message_count[stream_descriptor] = 0.0
315        return message
316
317    @staticmethod
318    def validate_connection(source_spec: ConnectorSpecification, config: TConfig) -> None:
319        # Remove internal flags from config before validating so
320        # jsonschema's additionalProperties flag won't fail the validation
321        connector_config, _ = split_config(config)
322        check_config_against_spec_or_exit(connector_config, source_spec)
323
324    @staticmethod
325    def set_up_secret_filter(config: TConfig, connection_specification: Mapping[str, Any]) -> None:
326        # Now that we have the config, we can use it to get a list of ai airbyte_secrets
327        # that we should filter in logging to avoid leaking secrets
328        config_secrets = get_secrets(connection_specification, config)
329        update_secrets(config_secrets)
330
331    @staticmethod
332    def airbyte_message_to_string(airbyte_message: AirbyteMessage) -> str:
333        global _HAS_LOGGED_FOR_SERIALIZATION_ERROR
334        serialized_message = AirbyteMessageSerializer.dump(airbyte_message)
335        try:
336            return orjson.dumps(serialized_message).decode()
337        except Exception as exception:
338            if not _HAS_LOGGED_FOR_SERIALIZATION_ERROR:
339                logger.warning(
340                    f"There was an error during the serialization of an AirbyteMessage: `{exception}`. This might impact the sync performances."
341                )
342                _HAS_LOGGED_FOR_SERIALIZATION_ERROR = True
343            return json.dumps(serialized_message)
344
345    @classmethod
346    def extract_state(cls, args: List[str]) -> Optional[Any]:
347        parsed_args = cls.parse_args(args)
348        if hasattr(parsed_args, "state"):
349            return parsed_args.state
350        return None
351
352    @classmethod
353    def extract_catalog(cls, args: List[str]) -> Optional[Any]:
354        parsed_args = cls.parse_args(args)
355        if hasattr(parsed_args, "catalog"):
356            return parsed_args.catalog
357        return None
358
359    @classmethod
360    def extract_config(cls, args: List[str]) -> Optional[Any]:
361        parsed_args = cls.parse_args(args)
362        if hasattr(parsed_args, "config"):
363            return parsed_args.config
364        return None
365
366    def _emit_queued_messages(self, source: Source) -> Iterable[AirbyteMessage]:
367        if hasattr(source, "message_repository") and source.message_repository:
368            yield from source.message_repository.consume_queue()
369        return

AirbyteEntrypoint(source: Source) View Source

55    def __init__(self, source: Source):
56        init_uncaught_exception_handler(logger)
57
58        # Deployment mode is read when instantiating the entrypoint because it is the common path shared by syncs and connector builder test requests
59        if is_cloud_environment():
60            _init_internal_request_filter()
61
62        self.source = source
63        self.logger = logging.getLogger(f"airbyte.{getattr(source, 'name', '')}")

source

logger

@staticmethod

def parse_args(args: List[str]) -> argparse.Namespace: View Source

 65    @staticmethod
 66    def parse_args(args: List[str]) -> argparse.Namespace:
 67        # set up parent parsers
 68        parent_parser = argparse.ArgumentParser(add_help=False)
 69        parent_parser.add_argument(
 70            "--debug", action="store_true", help="enables detailed debug logs related to the sync"
 71        )
 72        main_parser = argparse.ArgumentParser()
 73        subparsers = main_parser.add_subparsers(title="commands", dest="command")
 74
 75        # spec
 76        subparsers.add_parser(
 77            "spec", help="outputs the json configuration specification", parents=[parent_parser]
 78        )
 79
 80        # check
 81        check_parser = subparsers.add_parser(
 82            "check", help="checks the config can be used to connect", parents=[parent_parser]
 83        )
 84        required_check_parser = check_parser.add_argument_group("required named arguments")
 85        required_check_parser.add_argument(
 86            "--config", type=str, required=True, help="path to the json configuration file"
 87        )
 88        check_parser.add_argument(
 89            "--manifest-path",
 90            type=str,
 91            required=False,
 92            help="path to the YAML manifest file to inject into the config",
 93        )
 94        check_parser.add_argument(
 95            "--components-path",
 96            type=str,
 97            required=False,
 98            help="path to the custom components file, if it exists",
 99        )
100
101        # discover
102        discover_parser = subparsers.add_parser(
103            "discover",
104            help="outputs a catalog describing the source's schema",
105            parents=[parent_parser],
106        )
107        required_discover_parser = discover_parser.add_argument_group("required named arguments")
108        required_discover_parser.add_argument(
109            "--config", type=str, required=True, help="path to the json configuration file"
110        )
111        discover_parser.add_argument(
112            "--manifest-path",
113            type=str,
114            required=False,
115            help="path to the YAML manifest file to inject into the config",
116        )
117        discover_parser.add_argument(
118            "--components-path",
119            type=str,
120            required=False,
121            help="path to the custom components file, if it exists",
122        )
123
124        # read
125        read_parser = subparsers.add_parser(
126            "read", help="reads the source and outputs messages to STDOUT", parents=[parent_parser]
127        )
128
129        read_parser.add_argument(
130            "--state", type=str, required=False, help="path to the json-encoded state file"
131        )
132        required_read_parser = read_parser.add_argument_group("required named arguments")
133        required_read_parser.add_argument(
134            "--config", type=str, required=True, help="path to the json configuration file"
135        )
136        required_read_parser.add_argument(
137            "--catalog",
138            type=str,
139            required=True,
140            help="path to the catalog used to determine which data to read",
141        )
142        read_parser.add_argument(
143            "--manifest-path",
144            type=str,
145            required=False,
146            help="path to the YAML manifest file to inject into the config",
147        )
148        read_parser.add_argument(
149            "--components-path",
150            type=str,
151            required=False,
152            help="path to the custom components file, if it exists",
153        )
154
155        return main_parser.parse_args(args)

def run(self, parsed_args: argparse.Namespace) -> Iterable[str]: View Source

157    def run(self, parsed_args: argparse.Namespace) -> Iterable[str]:
158        cmd = parsed_args.command
159        if not cmd:
160            raise Exception("No command passed")
161
162        if hasattr(parsed_args, "debug") and parsed_args.debug:
163            self.logger.setLevel(logging.DEBUG)
164            logger.setLevel(logging.DEBUG)
165            self.logger.debug("Debug logs enabled")
166        else:
167            self.logger.setLevel(logging.INFO)
168
169        source_spec: ConnectorSpecification = self.source.spec(self.logger)
170        try:
171            with tempfile.TemporaryDirectory(
172                # Cleanup can fail on Windows due to file locks. Ignore if so,
173                # rather than failing the whole process.
174                ignore_cleanup_errors=True,
175            ) as temp_dir:
176                os.environ[ENV_REQUEST_CACHE_PATH] = (
177                    temp_dir  # set this as default directory for request_cache to store *.sqlite files
178                )
179                if cmd == "spec":
180                    message = AirbyteMessage(type=Type.SPEC, spec=source_spec)
181                    yield from [
182                        self.airbyte_message_to_string(queued_message)
183                        for queued_message in self._emit_queued_messages(self.source)
184                    ]
185                    yield self.airbyte_message_to_string(message)
186                else:
187                    raw_config = self.source.read_config(parsed_args.config)
188                    config = self.source.configure(raw_config, temp_dir)
189
190                    yield from [
191                        self.airbyte_message_to_string(queued_message)
192                        for queued_message in self._emit_queued_messages(self.source)
193                    ]
194                    if cmd == "check":
195                        yield from map(
196                            AirbyteEntrypoint.airbyte_message_to_string,
197                            self.check(source_spec, config),
198                        )
199                    elif cmd == "discover":
200                        yield from map(
201                            AirbyteEntrypoint.airbyte_message_to_string,
202                            self.discover(source_spec, config),
203                        )
204                    elif cmd == "read":
205                        config_catalog = self.source.read_catalog(parsed_args.catalog)
206                        state = self.source.read_state(parsed_args.state)
207
208                        yield from map(
209                            AirbyteEntrypoint.airbyte_message_to_string,
210                            self.read(source_spec, config, config_catalog, state),
211                        )
212                    else:
213                        raise Exception("Unexpected command " + cmd)
214        finally:
215            yield from [
216                self.airbyte_message_to_string(queued_message)
217                for queued_message in self._emit_queued_messages(self.source)
218            ]

def check( self, source_spec: airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification, config: ~TConfig) -> Iterable[AirbyteMessage]: View Source

220    def check(
221        self, source_spec: ConnectorSpecification, config: TConfig
222    ) -> Iterable[AirbyteMessage]:
223        self.set_up_secret_filter(config, source_spec.connectionSpecification)
224        try:
225            self.validate_connection(source_spec, config)
226        except AirbyteTracedException as traced_exc:
227            connection_status = traced_exc.as_connection_status_message()
228            # The platform uses the exit code to surface unexpected failures so we raise the exception if the failure type not a config error
229            # If the failure is not exceptional, we'll emit a failed connection status message and return
230            if traced_exc.failure_type != FailureType.config_error:
231                raise traced_exc
232            if connection_status:
233                yield from self._emit_queued_messages(self.source)
234                yield connection_status
235                return
236
237        try:
238            check_result = self.source.check(self.logger, config)
239        except AirbyteTracedException as traced_exc:
240            yield traced_exc.as_airbyte_message()
241            # The platform uses the exit code to surface unexpected failures so we raise the exception if the failure type not a config error
242            # If the failure is not exceptional, we'll emit a failed connection status message and return
243            if traced_exc.failure_type != FailureType.config_error:
244                raise traced_exc
245            else:
246                yield AirbyteMessage(
247                    type=Type.CONNECTION_STATUS,
248                    connectionStatus=AirbyteConnectionStatus(
249                        status=Status.FAILED, message=traced_exc.message
250                    ),
251                )
252                return
253        if check_result.status == Status.SUCCEEDED:
254            self.logger.info("Check succeeded")
255        else:
256            self.logger.error("Check failed")
257
258        yield from self._emit_queued_messages(self.source)
259        yield AirbyteMessage(type=Type.CONNECTION_STATUS, connectionStatus=check_result)

def discover( self, source_spec: airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification, config: ~TConfig) -> Iterable[AirbyteMessage]: View Source

261    def discover(
262        self, source_spec: ConnectorSpecification, config: TConfig
263    ) -> Iterable[AirbyteMessage]:
264        self.set_up_secret_filter(config, source_spec.connectionSpecification)
265        if self.source.check_config_against_spec:
266            self.validate_connection(source_spec, config)
267        catalog = self.source.discover(self.logger, config)
268
269        yield from self._emit_queued_messages(self.source)
270        yield AirbyteMessage(type=Type.CATALOG, catalog=catalog)

def read( self, source_spec: airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification, config: ~TConfig, catalog: Any, state: list[typing.Any]) -> Iterable[AirbyteMessage]: View Source

272    def read(
273        self, source_spec: ConnectorSpecification, config: TConfig, catalog: Any, state: list[Any]
274    ) -> Iterable[AirbyteMessage]:
275        self.set_up_secret_filter(config, source_spec.connectionSpecification)
276        if self.source.check_config_against_spec:
277            self.validate_connection(source_spec, config)
278
279        # The Airbyte protocol dictates that counts be expressed as float/double to better protect against integer overflows
280        stream_message_counter: DefaultDict[HashableStreamDescriptor, float] = defaultdict(float)
281        for message in self.source.read(self.logger, config, catalog, state):
282            yield self.handle_record_counts(message, stream_message_counter)
283        for message in self._emit_queued_messages(self.source):
284            yield self.handle_record_counts(message, stream_message_counter)

@staticmethod

def handle_record_counts( message: AirbyteMessage, stream_message_count: DefaultDict[airbyte_cdk.sources.connector_state_manager.HashableStreamDescriptor, float]) -> AirbyteMessage: View Source

286    @staticmethod
287    def handle_record_counts(
288        message: AirbyteMessage, stream_message_count: DefaultDict[HashableStreamDescriptor, float]
289    ) -> AirbyteMessage:
290        match message.type:
291            case Type.RECORD:
292                if message.record is None:
293                    raise ValueError("Record message must have a record attribute")
294
295                stream_message_count[
296                    HashableStreamDescriptor(
297                        name=message.record.stream,  # type: ignore[union-attr] # record has `stream`
298                        namespace=message.record.namespace,  # type: ignore[union-attr] # record has `namespace`
299                    )
300                ] += 1.0
301            case Type.STATE:
302                if message.state is None:
303                    raise ValueError("State message must have a state attribute")
304
305                stream_descriptor = message_utils.get_stream_descriptor(message)
306
307                # Set record count from the counter onto the state message
308                message.state.sourceStats = message.state.sourceStats or AirbyteStateStats()  # type: ignore[union-attr] # state has `sourceStats`
309                message.state.sourceStats.recordCount = stream_message_count.get(  # type: ignore[union-attr] # state has `sourceStats`
310                    stream_descriptor, 0.0
311                )
312
313                # Reset the counter
314                stream_message_count[stream_descriptor] = 0.0
315        return message

@staticmethod

def validate_connection( source_spec: airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification, config: ~TConfig) -> None: View Source

317    @staticmethod
318    def validate_connection(source_spec: ConnectorSpecification, config: TConfig) -> None:
319        # Remove internal flags from config before validating so
320        # jsonschema's additionalProperties flag won't fail the validation
321        connector_config, _ = split_config(config)
322        check_config_against_spec_or_exit(connector_config, source_spec)

@staticmethod

def set_up_secret_filter(config: ~TConfig, connection_specification: Mapping[str, Any]) -> None: View Source

324    @staticmethod
325    def set_up_secret_filter(config: TConfig, connection_specification: Mapping[str, Any]) -> None:
326        # Now that we have the config, we can use it to get a list of ai airbyte_secrets
327        # that we should filter in logging to avoid leaking secrets
328        config_secrets = get_secrets(connection_specification, config)
329        update_secrets(config_secrets)

@staticmethod

def airbyte_message_to_string( airbyte_message: AirbyteMessage) -> str: View Source

331    @staticmethod
332    def airbyte_message_to_string(airbyte_message: AirbyteMessage) -> str:
333        global _HAS_LOGGED_FOR_SERIALIZATION_ERROR
334        serialized_message = AirbyteMessageSerializer.dump(airbyte_message)
335        try:
336            return orjson.dumps(serialized_message).decode()
337        except Exception as exception:
338            if not _HAS_LOGGED_FOR_SERIALIZATION_ERROR:
339                logger.warning(
340                    f"There was an error during the serialization of an AirbyteMessage: `{exception}`. This might impact the sync performances."
341                )
342                _HAS_LOGGED_FOR_SERIALIZATION_ERROR = True
343            return json.dumps(serialized_message)

@classmethod

def extract_state(cls, args: List[str]) -> Optional[Any]: View Source

345    @classmethod
346    def extract_state(cls, args: List[str]) -> Optional[Any]:
347        parsed_args = cls.parse_args(args)
348        if hasattr(parsed_args, "state"):
349            return parsed_args.state
350        return None

@classmethod

def extract_catalog(cls, args: List[str]) -> Optional[Any]: View Source

352    @classmethod
353    def extract_catalog(cls, args: List[str]) -> Optional[Any]:
354        parsed_args = cls.parse_args(args)
355        if hasattr(parsed_args, "catalog"):
356            return parsed_args.catalog
357        return None

@classmethod

def extract_config(cls, args: List[str]) -> Optional[Any]: View Source

359    @classmethod
360    def extract_config(cls, args: List[str]) -> Optional[Any]:
361        parsed_args = cls.parse_args(args)
362        if hasattr(parsed_args, "config"):
363            return parsed_args.config
364        return None

class AbstractAPIBudget(abc.ABC): View Source

479class AbstractAPIBudget(abc.ABC):
480    """Interface to some API where a client allowed to have N calls per T interval.
481
482    Important: APIBudget is not doing any API calls, the end user code is responsible to call this interface
483        to respect call rate limitation of the API.
484
485    It supports multiple policies applied to different group of requests. To distinct these groups we use RequestMatchers.
486    Individual policy represented by MovingWindowCallRatePolicy and currently supports only moving window strategy.
487    """
488
489    @abc.abstractmethod
490    def acquire_call(
491        self, request: Any, block: bool = True, timeout: Optional[float] = None
492    ) -> None:
493        """Try to get a call from budget, will block by default
494
495        :param request:
496        :param block: when true (default) will block the current thread until call credit is available
497        :param timeout: if set will limit maximum time in block, otherwise will wait until credit is available
498        :raises: CallRateLimitHit - when no credits left and if timeout was set the waiting time exceed the timeout
499        """
500
501    @abc.abstractmethod
502    def get_matching_policy(self, request: Any) -> Optional[AbstractCallRatePolicy]:
503        """Find matching call rate policy for specific request"""
504
505    @abc.abstractmethod
506    def update_from_response(self, request: Any, response: Any) -> None:
507        """Update budget information based on response from API
508
509        :param request: the initial request that triggered this response
510        :param response: response from the API
511        """

Interface to some API where a client allowed to have N calls per T interval.

Important: APIBudget is not doing any API calls, the end user code is responsible to call this interface to respect call rate limitation of the API.

It supports multiple policies applied to different group of requests. To distinct these groups we use RequestMatchers. Individual policy represented by MovingWindowCallRatePolicy and currently supports only moving window strategy.

@abc.abstractmethod

def acquire_call( self, request: Any, block: bool = True, timeout: Optional[float] = None) -> None: View Source

489    @abc.abstractmethod
490    def acquire_call(
491        self, request: Any, block: bool = True, timeout: Optional[float] = None
492    ) -> None:
493        """Try to get a call from budget, will block by default
494
495        :param request:
496        :param block: when true (default) will block the current thread until call credit is available
497        :param timeout: if set will limit maximum time in block, otherwise will wait until credit is available
498        :raises: CallRateLimitHit - when no credits left and if timeout was set the waiting time exceed the timeout
499        """

Try to get a call from budget, will block by default

Parameters

request:
block: when true (default) will block the current thread until call credit is available
timeout: if set will limit maximum time in block, otherwise will wait until credit is available

Raises

CallRateLimitHit - when no credits left and if timeout was set the waiting time exceed the timeout

@abc.abstractmethod

def get_matching_policy( self, request: Any) -> Optional[airbyte_cdk.sources.streams.call_rate.AbstractCallRatePolicy]: View Source

501    @abc.abstractmethod
502    def get_matching_policy(self, request: Any) -> Optional[AbstractCallRatePolicy]:
503        """Find matching call rate policy for specific request"""

Find matching call rate policy for specific request

@abc.abstractmethod

def update_from_response(self, request: Any, response: Any) -> None: View Source

505    @abc.abstractmethod
506    def update_from_response(self, request: Any, response: Any) -> None:
507        """Update budget information based on response from API
508
509        :param request: the initial request that triggered this response
510        :param response: response from the API
511        """

Update budget information based on response from API

Parameters

request: the initial request that triggered this response
response: response from the API

class AbstractHeaderAuthenticator(requests.auth.AuthBase): View Source

13class AbstractHeaderAuthenticator(AuthBase):
14    """Abstract class for an header-based authenticators that add a header to outgoing HTTP requests."""
15
16    def __call__(self, request: requests.PreparedRequest) -> Any:
17        """Attach the HTTP headers required to authenticate on the HTTP request"""
18        request.headers.update(self.get_auth_header())
19        return request
20
21    def get_auth_header(self) -> Mapping[str, Any]:
22        """The header to set on outgoing HTTP requests"""
23        if self.auth_header:
24            return {self.auth_header: self.token}
25        return {}
26
27    @property
28    @abstractmethod
29    def auth_header(self) -> str:
30        """HTTP header to set on the requests"""
31
32    @property
33    @abstractmethod
34    def token(self) -> str:
35        """The header value to set on outgoing HTTP requests"""

Abstract class for an header-based authenticators that add a header to outgoing HTTP requests.

def get_auth_header(self) -> Mapping[str, Any]: View Source

21    def get_auth_header(self) -> Mapping[str, Any]:
22        """The header to set on outgoing HTTP requests"""
23        if self.auth_header:
24            return {self.auth_header: self.token}
25        return {}

The header to set on outgoing HTTP requests

auth_header: str View Source

27    @property
28    @abstractmethod
29    def auth_header(self) -> str:
30        """HTTP header to set on the requests"""

HTTP header to set on the requests

token: str View Source

32    @property
33    @abstractmethod
34    def token(self) -> str:
35        """The header value to set on outgoing HTTP requests"""

The header value to set on outgoing HTTP requests

class BaseBackoffException(requests.exceptions.HTTPError): View Source

12class BaseBackoffException(requests.exceptions.HTTPError):
13    def __init__(
14        self,
15        request: requests.PreparedRequest,
16        response: Optional[Union[requests.Response, Exception]],
17        error_message: str = "",
18    ):
19        if isinstance(response, requests.Response):
20            error_message = (
21                error_message
22                or f"Request URL: {request.url}, Response Code: {response.status_code}, Response Text: {response.text}"
23            )
24            super().__init__(error_message, request=request, response=response)
25        else:
26            error_message = error_message or f"Request URL: {request.url}, Exception: {response}"
27            super().__init__(error_message, request=request, response=None)

An HTTP error occurred.

BaseBackoffException( request: requests.models.PreparedRequest, response: Union[requests.models.Response, Exception, NoneType], error_message: str = '') View Source

13    def __init__(
14        self,
15        request: requests.PreparedRequest,
16        response: Optional[Union[requests.Response, Exception]],
17        error_message: str = "",
18    ):
19        if isinstance(response, requests.Response):
20            error_message = (
21                error_message
22                or f"Request URL: {request.url}, Response Code: {response.status_code}, Response Text: {response.text}"
23            )
24            super().__init__(error_message, request=request, response=response)
25        else:
26            error_message = error_message or f"Request URL: {request.url}, Exception: {response}"
27            super().__init__(error_message, request=request, response=None)

Initialize RequestException with request and response objects.

class CachedLimiterSession(requests_cache.session.CacheMixin, airbyte_cdk.sources.streams.call_rate.LimiterMixin, requests.sessions.Session): View Source

704class CachedLimiterSession(requests_cache.CacheMixin, LimiterMixin, requests.Session):
705    """Session class with caching and rate-limiting behavior."""

Session class with caching and rate-limiting behavior.

class DefaultBackoffException(airbyte_cdk.BaseBackoffException): View Source

57class DefaultBackoffException(BaseBackoffException):
58    pass

An HTTP error occurred.

Inherited Members

BaseBackoffException: BaseBackoffException

def default_backoff_handler( max_tries: Optional[int], factor: float, max_time: Optional[int] = None, **kwargs: Any) -> Callable[[Callable[[requests.models.PreparedRequest, Mapping[str, Any]], requests.models.Response]], Callable[[requests.models.PreparedRequest, Mapping[str, Any]], requests.models.Response]]: View Source

34def default_backoff_handler(
35    max_tries: Optional[int], factor: float, max_time: Optional[int] = None, **kwargs: Any
36) -> Callable[[SendRequestCallableType], SendRequestCallableType]:
37    def log_retry_attempt(details: Mapping[str, Any]) -> None:
38        _, exc, _ = sys.exc_info()
39        if isinstance(exc, RequestException) and exc.response:
40            logger.info(
41                f"Status code: {exc.response.status_code!r}, Response Content: {exc.response.content!r}"
42            )
43        logger.info(
44            f"Caught retryable error '{str(exc)}' after {details['tries']} tries. Waiting {details['wait']} seconds then retrying..."
45        )
46
47    def should_give_up(exc: Exception) -> bool:
48        # If a non-rate-limiting related 4XX error makes it this far, it means it was unexpected and probably consistent, so we shouldn't back off
49        if isinstance(exc, RequestException):
50            if exc.response is not None:
51                give_up: bool = (
52                    exc.response is not None
53                    and exc.response.status_code != codes.too_many_requests
54                    and 400 <= exc.response.status_code < 500
55                )
56                if give_up:
57                    logger.info(f"Giving up for returned HTTP status: {exc.response.status_code!r}")
58                return give_up
59        # Only RequestExceptions are retryable, so if we get here, it's not retryable
60        return False
61
62    return backoff.on_exception(  # type: ignore # Decorator function returns a function with a different signature than the input function, so mypy can't infer the type of the returned function
63        backoff.expo,
64        TRANSIENT_EXCEPTIONS,
65        jitter=None,
66        on_backoff=log_retry_attempt,
67        giveup=should_give_up,
68        max_tries=max_tries,
69        max_time=max_time,
70        factor=factor,
71        **kwargs,
72    )

class HttpAPIBudget(airbyte_cdk.sources.streams.call_rate.APIBudget): View Source

631class HttpAPIBudget(APIBudget):
632    """Implementation of AbstractAPIBudget for HTTP"""
633
634    def __init__(
635        self,
636        ratelimit_reset_header: str = "ratelimit-reset",
637        ratelimit_remaining_header: str = "ratelimit-remaining",
638        status_codes_for_ratelimit_hit: list[int] = [429],
639        **kwargs: Any,
640    ):
641        """Constructor
642
643        :param ratelimit_reset_header: name of the header that has a timestamp of the next reset of call budget
644        :param ratelimit_remaining_header: name of the header that has the number of calls left
645        :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit
646        """
647        self._ratelimit_reset_header = ratelimit_reset_header
648        self._ratelimit_remaining_header = ratelimit_remaining_header
649        self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit
650        super().__init__(**kwargs)
651
652    def update_from_response(self, request: Any, response: Any) -> None:
653        policy = self.get_matching_policy(request)
654        if not policy:
655            return
656
657        if isinstance(response, requests.Response):
658            available_calls = self.get_calls_left_from_response(response)
659            reset_ts = self.get_reset_ts_from_response(response)
660            policy.update(available_calls=available_calls, call_reset_ts=reset_ts)
661
662    def get_reset_ts_from_response(
663        self, response: requests.Response
664    ) -> Optional[datetime.datetime]:
665        if response.headers.get(self._ratelimit_reset_header):
666            return datetime.datetime.fromtimestamp(
667                int(response.headers[self._ratelimit_reset_header])
668            )
669        return None
670
671    def get_calls_left_from_response(self, response: requests.Response) -> Optional[int]:
672        if response.headers.get(self._ratelimit_remaining_header):
673            return int(response.headers[self._ratelimit_remaining_header])
674
675        if response.status_code in self._status_codes_for_ratelimit_hit:
676            return 0
677
678        return None

Implementation of AbstractAPIBudget for HTTP

HttpAPIBudget( ratelimit_reset_header: str = 'ratelimit-reset', ratelimit_remaining_header: str = 'ratelimit-remaining', status_codes_for_ratelimit_hit: list[int] = [429], **kwargs: Any) View Source

634    def __init__(
635        self,
636        ratelimit_reset_header: str = "ratelimit-reset",
637        ratelimit_remaining_header: str = "ratelimit-remaining",
638        status_codes_for_ratelimit_hit: list[int] = [429],
639        **kwargs: Any,
640    ):
641        """Constructor
642
643        :param ratelimit_reset_header: name of the header that has a timestamp of the next reset of call budget
644        :param ratelimit_remaining_header: name of the header that has the number of calls left
645        :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit
646        """
647        self._ratelimit_reset_header = ratelimit_reset_header
648        self._ratelimit_remaining_header = ratelimit_remaining_header
649        self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit
650        super().__init__(**kwargs)

Constructor

Parameters

ratelimit_reset_header: name of the header that has a timestamp of the next reset of call budget
ratelimit_remaining_header: name of the header that has the number of calls left
status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit

def update_from_response(self, request: Any, response: Any) -> None: View Source

652    def update_from_response(self, request: Any, response: Any) -> None:
653        policy = self.get_matching_policy(request)
654        if not policy:
655            return
656
657        if isinstance(response, requests.Response):
658            available_calls = self.get_calls_left_from_response(response)
659            reset_ts = self.get_reset_ts_from_response(response)
660            policy.update(available_calls=available_calls, call_reset_ts=reset_ts)

Update budget information based on the API response.

Parameters

request: the initial request that triggered this response
response: response from the API

def get_reset_ts_from_response(self, response: requests.models.Response) -> Optional[datetime.datetime]: View Source

662    def get_reset_ts_from_response(
663        self, response: requests.Response
664    ) -> Optional[datetime.datetime]:
665        if response.headers.get(self._ratelimit_reset_header):
666            return datetime.datetime.fromtimestamp(
667                int(response.headers[self._ratelimit_reset_header])
668            )
669        return None

def get_calls_left_from_response(self, response: requests.models.Response) -> Optional[int]: View Source

671    def get_calls_left_from_response(self, response: requests.Response) -> Optional[int]:
672        if response.headers.get(self._ratelimit_remaining_header):
673            return int(response.headers[self._ratelimit_remaining_header])
674
675        if response.status_code in self._status_codes_for_ratelimit_hit:
676            return 0
677
678        return None

Inherited Members

airbyte_cdk.sources.streams.call_rate.APIBudget: get_matching_policy; acquire_call

HttpAuthenticator

class HttpRequestMatcher(airbyte_cdk.sources.streams.call_rate.RequestMatcher): View Source

103class HttpRequestMatcher(RequestMatcher):
104    """Simple implementation of RequestMatcher for HTTP requests using HttpRequestRegexMatcher under the hood."""
105
106    def __init__(
107        self,
108        method: Optional[str] = None,
109        url: Optional[str] = None,
110        params: Optional[Mapping[str, Any]] = None,
111        headers: Optional[Mapping[str, Any]] = None,
112    ):
113        """Constructor
114
115        :param method: HTTP method (e.g., "GET", "POST").
116        :param url: Full URL to match.
117        :param params: Dictionary of query parameters to match.
118        :param headers: Dictionary of headers to match.
119        """
120        # Parse the URL to extract the base and path
121        if url:
122            parsed_url = parse.urlsplit(url)
123            url_base = f"{parsed_url.scheme}://{parsed_url.netloc}"
124            url_path = parsed_url.path if parsed_url.path != "/" else None
125        else:
126            url_base = None
127            url_path = None
128
129        # Use HttpRequestRegexMatcher under the hood
130        self._regex_matcher = HttpRequestRegexMatcher(
131            method=method,
132            url_base=url_base,
133            url_path_pattern=re.escape(url_path) if url_path else None,
134            params=params,
135            headers=headers,
136        )
137
138    def __call__(self, request: Any) -> bool:
139        """
140        :param request: A requests.Request or requests.PreparedRequest instance.
141        :return: True if the request matches all provided criteria; False otherwise.
142        """
143        return self._regex_matcher(request)
144
145    def __str__(self) -> str:
146        return (
147            f"HttpRequestMatcher(method={self._regex_matcher._method}, "
148            f"url={self._regex_matcher._url_base}{self._regex_matcher._url_path_pattern.pattern if self._regex_matcher._url_path_pattern else ''}, "
149            f"params={self._regex_matcher._params}, headers={self._regex_matcher._headers})"
150        )

Simple implementation of RequestMatcher for HTTP requests using HttpRequestRegexMatcher under the hood.

HttpRequestMatcher( method: Optional[str] = None, url: Optional[str] = None, params: Optional[Mapping[str, Any]] = None, headers: Optional[Mapping[str, Any]] = None) View Source

106    def __init__(
107        self,
108        method: Optional[str] = None,
109        url: Optional[str] = None,
110        params: Optional[Mapping[str, Any]] = None,
111        headers: Optional[Mapping[str, Any]] = None,
112    ):
113        """Constructor
114
115        :param method: HTTP method (e.g., "GET", "POST").
116        :param url: Full URL to match.
117        :param params: Dictionary of query parameters to match.
118        :param headers: Dictionary of headers to match.
119        """
120        # Parse the URL to extract the base and path
121        if url:
122            parsed_url = parse.urlsplit(url)
123            url_base = f"{parsed_url.scheme}://{parsed_url.netloc}"
124            url_path = parsed_url.path if parsed_url.path != "/" else None
125        else:
126            url_base = None
127            url_path = None
128
129        # Use HttpRequestRegexMatcher under the hood
130        self._regex_matcher = HttpRequestRegexMatcher(
131            method=method,
132            url_base=url_base,
133            url_path_pattern=re.escape(url_path) if url_path else None,
134            params=params,
135            headers=headers,
136        )

Constructor

Parameters

method: HTTP method (e.g., "GET", "POST").
url: Full URL to match.
params: Dictionary of query parameters to match.
headers: Dictionary of headers to match.

class HttpSubStream(airbyte_cdk.HttpStream, abc.ABC): View Source

562class HttpSubStream(HttpStream, ABC):
563    def __init__(self, parent: HttpStream, **kwargs: Any):
564        """
565        :param parent: should be the instance of HttpStream class
566        """
567        super().__init__(**kwargs)
568        self.parent = parent
569        self.has_multiple_slices = (
570            True  # Substreams are based on parent records which implies there are multiple slices
571        )
572
573        # There are three conditions that dictate if RFR should automatically be applied to a stream
574        # 1. Streams that explicitly initialize their own cursor should defer to it and not automatically apply RFR
575        # 2. Streams with at least one cursor_field are incremental and thus a superior sync to RFR.
576        # 3. Streams overriding read_records() do not guarantee that they will call the parent implementation which can perform
577        #    per-page checkpointing so RFR is only supported if a stream use the default `HttpStream.read_records()` method
578        if (
579            not self.cursor
580            and len(self.cursor_field) == 0
581            and type(self).read_records is HttpStream.read_records
582        ):
583            self.cursor = SubstreamResumableFullRefreshCursor()
584
585    def stream_slices(
586        self,
587        sync_mode: SyncMode,
588        cursor_field: Optional[List[str]] = None,
589        stream_state: Optional[Mapping[str, Any]] = None,
590    ) -> Iterable[Optional[Mapping[str, Any]]]:
591        # read_stateless() assumes the parent is not concurrent. This is currently okay since the concurrent CDK does
592        # not support either substreams or RFR, but something that needs to be considered once we do
593        for parent_record in self.parent.read_only_records(stream_state):
594            # Skip non-records (eg AirbyteLogMessage)
595            if isinstance(parent_record, AirbyteMessage):
596                if parent_record.type == MessageType.RECORD:
597                    parent_record = parent_record.record.data  # type: ignore [assignment, union-attr]  # Incorrect type for assignment
598                else:
599                    continue
600            elif isinstance(parent_record, Record):
601                parent_record = parent_record.data
602            yield {"parent": parent_record}

Base abstract class for an Airbyte Stream using the HTTP protocol. Basic building block for users building an Airbyte source for a HTTP API.

HttpSubStream( parent: HttpStream, **kwargs: Any) View Source

563    def __init__(self, parent: HttpStream, **kwargs: Any):
564        """
565        :param parent: should be the instance of HttpStream class
566        """
567        super().__init__(**kwargs)
568        self.parent = parent
569        self.has_multiple_slices = (
570            True  # Substreams are based on parent records which implies there are multiple slices
571        )
572
573        # There are three conditions that dictate if RFR should automatically be applied to a stream
574        # 1. Streams that explicitly initialize their own cursor should defer to it and not automatically apply RFR
575        # 2. Streams with at least one cursor_field are incremental and thus a superior sync to RFR.
576        # 3. Streams overriding read_records() do not guarantee that they will call the parent implementation which can perform
577        #    per-page checkpointing so RFR is only supported if a stream use the default `HttpStream.read_records()` method
578        if (
579            not self.cursor
580            and len(self.cursor_field) == 0
581            and type(self).read_records is HttpStream.read_records
582        ):
583            self.cursor = SubstreamResumableFullRefreshCursor()

Parameters

parent: should be the instance of HttpStream class

parent

has_multiple_slices = False

def stream_slices( self, sync_mode: airbyte_protocol_dataclasses.models.airbyte_protocol.SyncMode, cursor_field: Optional[List[str]] = None, stream_state: Optional[Mapping[str, Any]] = None) -> Iterable[Optional[Mapping[str, Any]]]: View Source

585    def stream_slices(
586        self,
587        sync_mode: SyncMode,
588        cursor_field: Optional[List[str]] = None,
589        stream_state: Optional[Mapping[str, Any]] = None,
590    ) -> Iterable[Optional[Mapping[str, Any]]]:
591        # read_stateless() assumes the parent is not concurrent. This is currently okay since the concurrent CDK does
592        # not support either substreams or RFR, but something that needs to be considered once we do
593        for parent_record in self.parent.read_only_records(stream_state):
594            # Skip non-records (eg AirbyteLogMessage)
595            if isinstance(parent_record, AirbyteMessage):
596                if parent_record.type == MessageType.RECORD:
597                    parent_record = parent_record.record.data  # type: ignore [assignment, union-attr]  # Incorrect type for assignment
598                else:
599                    continue
600            elif isinstance(parent_record, Record):
601                parent_record = parent_record.data
602            yield {"parent": parent_record}

Override to define the slices for this stream. See the stream slicing section of the docs for more information.

Parameters

sync_mode:
cursor_field:
stream_state:

Returns

Inherited Members

HttpStream: source_defined_cursor; page_size; exit_on_rate_limit; cache_filename; use_cache; url_base; http_method; raise_on_http_errors; max_retries; max_time; retry_factor; next_page_token; path; request_params; request_headers; request_body_data; request_body_json; request_kwargs; parse_response; get_backoff_strategy; get_error_handler; parse_response_error_message; get_error_display_message; read_records; state; get_cursor; get_log_formatter
Stream: logger; transformer; cursor; name; read; read_only_records; get_json_schema; as_airbyte_stream; supports_incremental; is_resumable; cursor_field; namespace; primary_key; state_checkpoint_interval; get_updated_state; log_stream_sync_configuration; configured_json_schema

class LimiterSession(airbyte_cdk.sources.streams.call_rate.LimiterMixin, requests.sessions.Session): View Source

700class LimiterSession(LimiterMixin, requests.Session):
701    """Session that adds rate-limiting behavior to requests."""

Session that adds rate-limiting behavior to requests.

Inherited Members

airbyte_cdk.sources.streams.call_rate.LimiterMixin: LimiterMixin; send

class MovingWindowCallRatePolicy(airbyte_cdk.sources.streams.call_rate.BaseCallRatePolicy): View Source

396class MovingWindowCallRatePolicy(BaseCallRatePolicy):
397    """
398    Policy to control requests rate implemented on top of PyRateLimiter lib.
399    The main difference between this policy and FixedWindowCallRatePolicy is that the rate-limiting window
400    is moving along requests that we made, and there is no moment when we reset an available number of calls.
401    This strategy requires saving of timestamps of all requests within a window.
402    """
403
404    def __init__(self, rates: list[Rate], matchers: list[RequestMatcher]):
405        """Constructor
406
407        :param rates: list of rates, the order is important and must be ascending
408        :param matchers:
409        """
410        if not rates:
411            raise ValueError("The list of rates can not be empty")
412        pyrate_rates = [
413            PyRateRate(limit=rate.limit, interval=int(rate.interval.total_seconds() * 1000))
414            for rate in rates
415        ]
416        self._bucket = InMemoryBucket(pyrate_rates)
417        # Limiter will create the background task that clears old requests in the bucket
418        self._limiter = Limiter(self._bucket)
419        super().__init__(matchers=matchers)
420
421    def try_acquire(self, request: Any, weight: int) -> None:
422        if not self.matches(request):
423            raise ValueError("Request does not match the policy")
424
425        try:
426            self._limiter.try_acquire(request, weight=weight)
427        except BucketFullException as exc:
428            item = self._limiter.bucket_factory.wrap_item(request, weight)
429            assert isinstance(item, RateItem)
430
431            with self._limiter.lock:
432                time_to_wait = self._bucket.waiting(item)
433                assert isinstance(time_to_wait, int)
434
435                raise CallRateLimitHit(
436                    error=str(exc.meta_info["error"]),
437                    item=request,
438                    weight=int(exc.meta_info["weight"]),
439                    rate=str(exc.meta_info["rate"]),
440                    time_to_wait=timedelta(milliseconds=time_to_wait),
441                )
442
443    def update(
444        self, available_calls: Optional[int], call_reset_ts: Optional[datetime.datetime]
445    ) -> None:
446        """Adjust call bucket to reflect the state of the API server
447
448        :param available_calls:
449        :param call_reset_ts:
450        :return:
451        """
452        if (
453            available_calls is not None and call_reset_ts is None
454        ):  # we do our best to sync buckets with API
455            if available_calls == 0:
456                with self._limiter.lock:
457                    items_to_add = self._bucket.count() < self._bucket.rates[0].limit
458                    if items_to_add > 0:
459                        now: int = TimeClock().now()  # type: ignore[no-untyped-call]
460                        self._bucket.put(RateItem(name="dummy", timestamp=now, weight=items_to_add))
461        # TODO: add support if needed, it might be that it is not possible to make a good solution for this case
462        # if available_calls is not None and call_reset_ts is not None:
463        #     ts = call_reset_ts.timestamp()
464
465    def __str__(self) -> str:
466        """Return a human-friendly description of the moving window rate policy for logging purposes."""
467        rates_info = ", ".join(
468            f"{rate.limit} per {timedelta(milliseconds=rate.interval)}"
469            for rate in self._bucket.rates
470        )
471        current_bucket_count = self._bucket.count()
472        matcher_str = ", ".join(f"{matcher}" for matcher in self._matchers)
473        return (
474            f"MovingWindowCallRatePolicy(rates=[{rates_info}], current_bucket_count={current_bucket_count}, "
475            f"matchers=[{matcher_str}])"
476        )

Policy to control requests rate implemented on top of PyRateLimiter lib. The main difference between this policy and FixedWindowCallRatePolicy is that the rate-limiting window is moving along requests that we made, and there is no moment when we reset an available number of calls. This strategy requires saving of timestamps of all requests within a window.

MovingWindowCallRatePolicy( rates: list[Rate], matchers: list[airbyte_cdk.sources.streams.call_rate.RequestMatcher]) View Source

404    def __init__(self, rates: list[Rate], matchers: list[RequestMatcher]):
405        """Constructor
406
407        :param rates: list of rates, the order is important and must be ascending
408        :param matchers:
409        """
410        if not rates:
411            raise ValueError("The list of rates can not be empty")
412        pyrate_rates = [
413            PyRateRate(limit=rate.limit, interval=int(rate.interval.total_seconds() * 1000))
414            for rate in rates
415        ]
416        self._bucket = InMemoryBucket(pyrate_rates)
417        # Limiter will create the background task that clears old requests in the bucket
418        self._limiter = Limiter(self._bucket)
419        super().__init__(matchers=matchers)

Constructor

Parameters

rates: list of rates, the order is important and must be ascending
matchers:

def try_acquire(self, request: Any, weight: int) -> None: View Source

421    def try_acquire(self, request: Any, weight: int) -> None:
422        if not self.matches(request):
423            raise ValueError("Request does not match the policy")
424
425        try:
426            self._limiter.try_acquire(request, weight=weight)
427        except BucketFullException as exc:
428            item = self._limiter.bucket_factory.wrap_item(request, weight)
429            assert isinstance(item, RateItem)
430
431            with self._limiter.lock:
432                time_to_wait = self._bucket.waiting(item)
433                assert isinstance(time_to_wait, int)
434
435                raise CallRateLimitHit(
436                    error=str(exc.meta_info["error"]),
437                    item=request,
438                    weight=int(exc.meta_info["weight"]),
439                    rate=str(exc.meta_info["rate"]),
440                    time_to_wait=timedelta(milliseconds=time_to_wait),
441                )

Try to acquire request

Parameters

request: a request object representing a single call to API
weight: number of requests to deduct from credit

Returns

def update( self, available_calls: Optional[int], call_reset_ts: Optional[datetime.datetime]) -> None: View Source

443    def update(
444        self, available_calls: Optional[int], call_reset_ts: Optional[datetime.datetime]
445    ) -> None:
446        """Adjust call bucket to reflect the state of the API server
447
448        :param available_calls:
449        :param call_reset_ts:
450        :return:
451        """
452        if (
453            available_calls is not None and call_reset_ts is None
454        ):  # we do our best to sync buckets with API
455            if available_calls == 0:
456                with self._limiter.lock:
457                    items_to_add = self._bucket.count() < self._bucket.rates[0].limit
458                    if items_to_add > 0:
459                        now: int = TimeClock().now()  # type: ignore[no-untyped-call]
460                        self._bucket.put(RateItem(name="dummy", timestamp=now, weight=items_to_add))
461        # TODO: add support if needed, it might be that it is not possible to make a good solution for this case
462        # if available_calls is not None and call_reset_ts is not None:
463        #     ts = call_reset_ts.timestamp()

Adjust call bucket to reflect the state of the API server

Parameters

available_calls:
call_reset_ts:

Returns

Inherited Members

airbyte_cdk.sources.streams.call_rate.BaseCallRatePolicy: matches

MultipleTokenAuthenticator

@dataclasses.dataclass

class Rate: View Source

33@dataclasses.dataclass
34class Rate:
35    """Call rate limit"""
36
37    limit: int
38    interval: timedelta

Call rate limit

Rate(limit: int, interval: datetime.timedelta)

limit: int

interval: datetime.timedelta

class TokenAuthenticator(airbyte_cdk.AbstractHeaderAuthenticator): View Source

39class TokenAuthenticator(AbstractHeaderAuthenticator):
40    """
41    Builds auth header, based on the token provided.
42    The token is attached to each request via the `auth_header` header.
43    """
44
45    @property
46    def auth_header(self) -> str:
47        return self._auth_header
48
49    @property
50    def token(self) -> str:
51        return f"{self._auth_method} {self._token}"
52
53    def __init__(self, token: str, auth_method: str = "Bearer", auth_header: str = "Authorization"):
54        self._auth_header = auth_header
55        self._auth_method = auth_method
56        self._token = token

Builds auth header, based on the token provided. The token is attached to each request via the auth_header header.

TokenAuthenticator( token: str, auth_method: str = 'Bearer', auth_header: str = 'Authorization') View Source

53    def __init__(self, token: str, auth_method: str = "Bearer", auth_header: str = "Authorization"):
54        self._auth_header = auth_header
55        self._auth_method = auth_method
56        self._token = token

auth_header: str View Source

45    @property
46    def auth_header(self) -> str:
47        return self._auth_header

HTTP header to set on the requests

token: str View Source

49    @property
50    def token(self) -> str:
51        return f"{self._auth_method} {self._token}"

The header value to set on outgoing HTTP requests

Inherited Members

AbstractHeaderAuthenticator: get_auth_header

class UserDefinedBackoffException(airbyte_cdk.BaseBackoffException): View Source

36class UserDefinedBackoffException(BaseBackoffException):
37    """
38    An exception that exposes how long it attempted to backoff
39    """
40
41    def __init__(
42        self,
43        backoff: Union[int, float],
44        request: requests.PreparedRequest,
45        response: Optional[Union[requests.Response, Exception]],
46        error_message: str = "",
47    ):
48        """
49        :param backoff: how long to backoff in seconds
50        :param request: the request that triggered this backoff exception
51        :param response: the response that triggered the backoff exception
52        """
53        self.backoff = backoff
54        super().__init__(request=request, response=response, error_message=error_message)

An exception that exposes how long it attempted to backoff

UserDefinedBackoffException( backoff: Union[int, float], request: requests.models.PreparedRequest, response: Union[requests.models.Response, Exception, NoneType], error_message: str = '') View Source

41    def __init__(
42        self,
43        backoff: Union[int, float],
44        request: requests.PreparedRequest,
45        response: Optional[Union[requests.Response, Exception]],
46        error_message: str = "",
47    ):
48        """
49        :param backoff: how long to backoff in seconds
50        :param request: the request that triggered this backoff exception
51        :param response: the response that triggered the backoff exception
52        """
53        self.backoff = backoff
54        super().__init__(request=request, response=response, error_message=error_message)

Parameters

backoff: how long to backoff in seconds
request: the request that triggered this backoff exception
response: the response that triggered the backoff exception

backoff

class AirbyteLogFormatter(logging.Formatter): View Source

60class AirbyteLogFormatter(logging.Formatter):
61    """Output log records using AirbyteMessage"""
62
63    # Transforming Python log levels to Airbyte protocol log levels
64    level_mapping = {
65        logging.FATAL: Level.FATAL,
66        logging.ERROR: Level.ERROR,
67        logging.WARNING: Level.WARN,
68        logging.INFO: Level.INFO,
69        logging.DEBUG: Level.DEBUG,
70    }
71
72    def format(self, record: logging.LogRecord) -> str:
73        """Return a JSON representation of the log message"""
74        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
75        if airbyte_level == Level.DEBUG:
76            extras = self.extract_extra_args_from_record(record)
77            debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
78            return filter_secrets(json.dumps(debug_dict))
79        else:
80            message = super().format(record)
81            message = filter_secrets(message)
82            log_message = AirbyteMessage(
83                type=Type.LOG, log=AirbyteLogMessage(level=airbyte_level, message=message)
84            )
85            return orjson.dumps(AirbyteMessageSerializer.dump(log_message)).decode()
86
87    @staticmethod
88    def extract_extra_args_from_record(record: logging.LogRecord) -> Mapping[str, Any]:
89        """
90        The python logger conflates default args with extra args. We use an empty log record and set operations
91        to isolate fields passed to the log record via extra by the developer.
92        """
93        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
94        extra_keys = set(record.__dict__.keys()) - default_attrs
95        return {k: str(getattr(record, k)) for k in extra_keys if hasattr(record, k)}

Output log records using AirbyteMessage

level_mapping = {50: <Level.FATAL: 'FATAL'>, 40: <Level.ERROR: 'ERROR'>, 30: <Level.WARN: 'WARN'>, 20: <Level.INFO: 'INFO'>, 10: <Level.DEBUG: 'DEBUG'>}

def format(self, record: logging.LogRecord) -> str: View Source

72    def format(self, record: logging.LogRecord) -> str:
73        """Return a JSON representation of the log message"""
74        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
75        if airbyte_level == Level.DEBUG:
76            extras = self.extract_extra_args_from_record(record)
77            debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
78            return filter_secrets(json.dumps(debug_dict))
79        else:
80            message = super().format(record)
81            message = filter_secrets(message)
82            log_message = AirbyteMessage(
83                type=Type.LOG, log=AirbyteLogMessage(level=airbyte_level, message=message)
84            )
85            return orjson.dumps(AirbyteMessageSerializer.dump(log_message)).decode()

Return a JSON representation of the log message

@staticmethod

def extract_extra_args_from_record(record: logging.LogRecord) -> Mapping[str, Any]: View Source

87    @staticmethod
88    def extract_extra_args_from_record(record: logging.LogRecord) -> Mapping[str, Any]:
89        """
90        The python logger conflates default args with extra args. We use an empty log record and set operations
91        to isolate fields passed to the log record via extra by the developer.
92        """
93        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
94        extra_keys = set(record.__dict__.keys()) - default_attrs
95        return {k: str(getattr(record, k)) for k in extra_keys if hasattr(record, k)}

The python logger conflates default args with extra args. We use an empty log record and set operations to isolate fields passed to the log record via extra by the developer.

def init_logger(name: Optional[str] = None) -> logging.Logger: View Source

44def init_logger(name: Optional[str] = None) -> logging.Logger:
45    """Initial set up of logger"""
46    logger = logging.getLogger(name)
47    logger.setLevel(logging.INFO)
48    logging.config.dictConfig(LOGGING_CONFIG)
49    return logger

Initial set up of logger

@dataclass

class AirbyteStream: View Source

300@dataclass
301class AirbyteStream:
302    name: str
303    json_schema: Dict[str, Any]
304    supported_sync_modes: List[SyncMode]
305    source_defined_cursor: Optional[bool] = None
306    default_cursor_field: Optional[List[str]] = None
307    source_defined_primary_key: Optional[List[List[str]]] = None
308    namespace: Optional[str] = None
309    is_resumable: Optional[bool] = None
310    is_file_based: Optional[bool] = None

AirbyteStream( name: str, json_schema: Dict[str, Any], supported_sync_modes: List[airbyte_protocol_dataclasses.models.airbyte_protocol.SyncMode], source_defined_cursor: Optional[bool] = None, default_cursor_field: Optional[List[str]] = None, source_defined_primary_key: Optional[List[List[str]]] = None, namespace: Optional[str] = None, is_resumable: Optional[bool] = None, is_file_based: Optional[bool] = None)

json_schema: Dict[str, Any]

supported_sync_modes: List[airbyte_protocol_dataclasses.models.airbyte_protocol.SyncMode]

source_defined_cursor: Optional[bool] = None

default_cursor_field: Optional[List[str]] = None

source_defined_primary_key: Optional[List[List[str]]] = None

namespace: Optional[str] = None

is_resumable: Optional[bool] = None

is_file_based: Optional[bool] = None

@dataclass

class AirbyteConnectionStatus: View Source

180@dataclass
181class AirbyteConnectionStatus:
182    status: Status
183    message: Optional[str] = None

AirbyteConnectionStatus( status: airbyte_protocol_dataclasses.models.airbyte_protocol.Status, message: Optional[str] = None)

status: airbyte_protocol_dataclasses.models.airbyte_protocol.Status

message: Optional[str] = None

@dataclass

class AirbyteMessage: View Source

79@dataclass
80class AirbyteMessage:
81    type: Type  # type: ignore [name-defined]
82    log: Optional[AirbyteLogMessage] = None  # type: ignore [name-defined]
83    spec: Optional[ConnectorSpecification] = None  # type: ignore [name-defined]
84    connectionStatus: Optional[AirbyteConnectionStatus] = None  # type: ignore [name-defined]
85    catalog: Optional[AirbyteCatalog] = None  # type: ignore [name-defined]
86    record: Optional[AirbyteRecordMessage] = None  # type: ignore [name-defined]
87    state: Optional[AirbyteStateMessage] = None
88    trace: Optional[AirbyteTraceMessage] = None  # type: ignore [name-defined]
89    control: Optional[AirbyteControlMessage] = None  # type: ignore [name-defined]

AirbyteMessage( type: airbyte_protocol_dataclasses.models.airbyte_protocol.Type, log: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteLogMessage] = None, spec: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification] = None, connectionStatus: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteConnectionStatus] = None, catalog: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteCatalog] = None, record: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessage] = None, state: Optional[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage] = None, trace: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteTraceMessage] = None, control: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteControlMessage] = None)

type: airbyte_protocol_dataclasses.models.airbyte_protocol.Type

log: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteLogMessage] = None

spec: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification] = None

connectionStatus: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteConnectionStatus] = None

catalog: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteCatalog] = None

record: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessage] = None

state: Optional[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage] = None

trace: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteTraceMessage] = None

control: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteControlMessage] = None

@dataclass

class ConfiguredAirbyteCatalog: View Source

379@dataclass
380class ConfiguredAirbyteCatalog:
381    streams: List[ConfiguredAirbyteStream]

ConfiguredAirbyteCatalog( streams: List[airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteStream])

streams: List[airbyte_protocol_dataclasses.models.airbyte_protocol.ConfiguredAirbyteStream]

class Status(enum.Enum): View Source

175class Status(Enum):
176    SUCCEEDED = 'SUCCEEDED'
177    FAILED = 'FAILED'

An enumeration.

SUCCEEDED = <Status.SUCCEEDED: 'SUCCEEDED'>

FAILED = <Status.FAILED: 'FAILED'>

class Type(enum.Enum): View Source

12class Type(Enum):
13    RECORD = 'RECORD'
14    STATE = 'STATE'
15    LOG = 'LOG'
16    SPEC = 'SPEC'
17    CONNECTION_STATUS = 'CONNECTION_STATUS'
18    CATALOG = 'CATALOG'
19    TRACE = 'TRACE'
20    CONTROL = 'CONTROL'
21    DESTINATION_CATALOG = 'DESTINATION_CATALOG'

An enumeration.

RECORD = <Type.RECORD: 'RECORD'>

STATE = <Type.STATE: 'STATE'>

LOG = <Type.LOG: 'LOG'>

SPEC = <Type.SPEC: 'SPEC'>

CONNECTION_STATUS = <Type.CONNECTION_STATUS: 'CONNECTION_STATUS'>

CATALOG = <Type.CATALOG: 'CATALOG'>

TRACE = <Type.TRACE: 'TRACE'>

CONTROL = <Type.CONTROL: 'CONTROL'>

DESTINATION_CATALOG = <Type.DESTINATION_CATALOG: 'DESTINATION_CATALOG'>

class OrchestratorType(enum.Enum): View Source

166class OrchestratorType(Enum):
167    CONNECTOR_CONFIG = 'CONNECTOR_CONFIG'

An enumeration.

CONNECTOR_CONFIG = <OrchestratorType.CONNECTOR_CONFIG: 'CONNECTOR_CONFIG'>

@dataclass

class ConfiguredAirbyteStream: View Source

313@dataclass
314class ConfiguredAirbyteStream:
315    stream: AirbyteStream
316    sync_mode: SyncMode
317    destination_sync_mode: DestinationSyncMode
318    cursor_field: Optional[List[str]] = None
319    destination_object_name: Optional[str] = None
320    primary_key: Optional[List[List[str]]] = None
321    generation_id: Optional[int] = None
322    minimum_generation_id: Optional[int] = None
323    sync_id: Optional[int] = None
324    include_files: Optional[bool] = None

ConfiguredAirbyteStream( stream: airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteStream, sync_mode: airbyte_protocol_dataclasses.models.airbyte_protocol.SyncMode, destination_sync_mode: airbyte_protocol_dataclasses.models.airbyte_protocol.DestinationSyncMode, cursor_field: Optional[List[str]] = None, destination_object_name: Optional[str] = None, primary_key: Optional[List[List[str]]] = None, generation_id: Optional[int] = None, minimum_generation_id: Optional[int] = None, sync_id: Optional[int] = None, include_files: Optional[bool] = None)

stream: airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteStream

sync_mode: airbyte_protocol_dataclasses.models.airbyte_protocol.SyncMode

destination_sync_mode: airbyte_protocol_dataclasses.models.airbyte_protocol.DestinationSyncMode

cursor_field: Optional[List[str]] = None

destination_object_name: Optional[str] = None

primary_key: Optional[List[List[str]]] = None

generation_id: Optional[int] = None

minimum_generation_id: Optional[int] = None

sync_id: Optional[int] = None

include_files: Optional[bool] = None

class DestinationSyncMode(enum.Enum): View Source

191class DestinationSyncMode(Enum):
192    append = 'append'
193    overwrite = 'overwrite'
194    append_dedup = 'append_dedup'
195    update = 'update'
196    soft_delete = 'soft_delete'

An enumeration.

append = <DestinationSyncMode.append: 'append'>

overwrite = <DestinationSyncMode.overwrite: 'overwrite'>

append_dedup = <DestinationSyncMode.append_dedup: 'append_dedup'>

update = <DestinationSyncMode.update: 'update'>

soft_delete = <DestinationSyncMode.soft_delete: 'soft_delete'>

class SyncMode(enum.Enum): View Source

186class SyncMode(Enum):
187    full_refresh = 'full_refresh'
188    incremental = 'incremental'

An enumeration.

full_refresh = <SyncMode.full_refresh: 'full_refresh'>

incremental = <SyncMode.incremental: 'incremental'>

class FailureType(enum.Enum): View Source

102class FailureType(Enum):
103    system_error = 'system_error'
104    config_error = 'config_error'
105    transient_error = 'transient_error'

An enumeration.

system_error = <FailureType.system_error: 'system_error'>

config_error = <FailureType.config_error: 'config_error'>

transient_error = <FailureType.transient_error: 'transient_error'>

@dataclass

class AdvancedAuth: View Source

332@dataclass
333class AdvancedAuth:
334    auth_flow_type: Optional[AuthFlowType] = None
335    predicate_key: Optional[List[str]] = None
336    predicate_value: Optional[str] = None
337    oauth_config_specification: Optional[OAuthConfigSpecification] = None

AdvancedAuth( auth_flow_type: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AuthFlowType] = None, predicate_key: Optional[List[str]] = None, predicate_value: Optional[str] = None, oauth_config_specification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.OAuthConfigSpecification] = None)

auth_flow_type: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AuthFlowType] = None

predicate_key: Optional[List[str]] = None

predicate_value: Optional[str] = None

oauth_config_specification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.OAuthConfigSpecification] = None

@dataclass

class AirbyteLogMessage: View Source

88@dataclass
89class AirbyteLogMessage:
90    level: Level
91    message: str
92    stack_trace: Optional[str] = None

AirbyteLogMessage( level: airbyte_protocol_dataclasses.models.airbyte_protocol.Level, message: str, stack_trace: Optional[str] = None)

level: airbyte_protocol_dataclasses.models.airbyte_protocol.Level

message: str

stack_trace: Optional[str] = None

@dataclass

class OAuthConfigSpecification: View Source

253@dataclass
254class OAuthConfigSpecification:
255    oauth_user_input_from_connector_config_specification: Optional[
256        Dict[str, Any]
257    ] = None
258    oauth_connector_input_specification: Optional[
259        OauthConnectorInputSpecification
260    ] = None
261    complete_oauth_output_specification: Optional[Dict[str, Any]] = None
262    complete_oauth_server_input_specification: Optional[Dict[str, Any]] = None
263    complete_oauth_server_output_specification: Optional[Dict[str, Any]] = None

OAuthConfigSpecification( oauth_user_input_from_connector_config_specification: Optional[Dict[str, Any]] = None, oauth_connector_input_specification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.OauthConnectorInputSpecification] = None, complete_oauth_output_specification: Optional[Dict[str, Any]] = None, complete_oauth_server_input_specification: Optional[Dict[str, Any]] = None, complete_oauth_server_output_specification: Optional[Dict[str, Any]] = None)

oauth_user_input_from_connector_config_specification: Optional[Dict[str, Any]] = None

oauth_connector_input_specification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.OauthConnectorInputSpecification] = None

complete_oauth_output_specification: Optional[Dict[str, Any]] = None

complete_oauth_server_input_specification: Optional[Dict[str, Any]] = None

complete_oauth_server_output_specification: Optional[Dict[str, Any]] = None

@dataclass

class ConnectorSpecification: View Source

340@dataclass
341class ConnectorSpecification:
342    connectionSpecification: Dict[str, Any]
343    documentationUrl: Optional[str] = None
344    changelogUrl: Optional[str] = None
345    supportsIncremental: Optional[bool] = None
346    supportsNormalization: Optional[bool] = False
347    supportsDBT: Optional[bool] = False
348    supported_destination_sync_modes: Optional[List[DestinationSyncMode]] = None
349    authSpecification: Optional[AuthSpecification] = None
350    advanced_auth: Optional[AdvancedAuth] = None
351    protocol_version: Optional[str] = None

ConnectorSpecification( connectionSpecification: Dict[str, Any], documentationUrl: Optional[str] = None, changelogUrl: Optional[str] = None, supportsIncremental: Optional[bool] = None, supportsNormalization: Optional[bool] = False, supportsDBT: Optional[bool] = False, supported_destination_sync_modes: Optional[List[airbyte_protocol_dataclasses.models.airbyte_protocol.DestinationSyncMode]] = None, authSpecification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AuthSpecification] = None, advanced_auth: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AdvancedAuth] = None, protocol_version: Optional[str] = None)

connectionSpecification: Dict[str, Any]

documentationUrl: Optional[str] = None

changelogUrl: Optional[str] = None

supportsIncremental: Optional[bool] = None

supportsNormalization: Optional[bool] = False

supportsDBT: Optional[bool] = False

supported_destination_sync_modes: Optional[List[airbyte_protocol_dataclasses.models.airbyte_protocol.DestinationSyncMode]] = None

authSpecification: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AuthSpecification] = None

advanced_auth: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AdvancedAuth] = None

protocol_version: Optional[str] = None

class Level(enum.Enum): View Source

79class Level(Enum):
80    FATAL = 'FATAL'
81    ERROR = 'ERROR'
82    WARN = 'WARN'
83    INFO = 'INFO'
84    DEBUG = 'DEBUG'
85    TRACE = 'TRACE'

An enumeration.

FATAL = <Level.FATAL: 'FATAL'>

ERROR = <Level.ERROR: 'ERROR'>

WARN = <Level.WARN: 'WARN'>

INFO = <Level.INFO: 'INFO'>

DEBUG = <Level.DEBUG: 'DEBUG'>

TRACE = <Level.TRACE: 'TRACE'>

@dataclass

class AirbyteRecordMessage: View Source

354@dataclass
355class AirbyteRecordMessage:
356    stream: str
357    data: Dict[str, Any]
358    emitted_at: int
359    namespace: Optional[str] = None
360    meta: Optional[AirbyteRecordMessageMeta] = None
361    file_reference: Optional[AirbyteRecordMessageFileReference] = None

AirbyteRecordMessage( stream: str, data: Dict[str, Any], emitted_at: int, namespace: Optional[str] = None, meta: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageMeta] = None, file_reference: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageFileReference] = None)

stream: str

data: Dict[str, Any]

emitted_at: int

namespace: Optional[str] = None

meta: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageMeta] = None

file_reference: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageFileReference] = None

class InMemoryMessageRepository(airbyte_cdk.MessageRepository): View Source

75class InMemoryMessageRepository(MessageRepository):
76    def __init__(self, log_level: Level = Level.INFO) -> None:
77        self._message_queue: Deque[AirbyteMessage] = deque()
78        self._log_level = log_level
79
80    def emit_message(self, message: AirbyteMessage) -> None:
81        self._message_queue.append(message)
82
83    def log_message(self, level: Level, message_provider: Callable[[], LogMessage]) -> None:
84        if _is_severe_enough(self._log_level, level):
85            self.emit_message(
86                AirbyteMessage(
87                    type=Type.LOG,
88                    log=AirbyteLogMessage(
89                        level=level, message=filter_secrets(json.dumps(message_provider()))
90                    ),
91                )
92            )
93
94    def consume_queue(self) -> Iterable[AirbyteMessage]:
95        while self._message_queue:
96            yield self._message_queue.popleft()

Helper class that provides a standard way to create an ABC using inheritance.

InMemoryMessageRepository( log_level: airbyte_protocol_dataclasses.models.airbyte_protocol.Level = <Level.INFO: 'INFO'>) View Source

76    def __init__(self, log_level: Level = Level.INFO) -> None:
77        self._message_queue: Deque[AirbyteMessage] = deque()
78        self._log_level = log_level

def emit_message( self, message: AirbyteMessage) -> None: View Source

80    def emit_message(self, message: AirbyteMessage) -> None:
81        self._message_queue.append(message)

def log_message( self, level: airbyte_protocol_dataclasses.models.airbyte_protocol.Level, message_provider: Callable[[], dict[str, Union[dict[str, Union[dict[str, ForwardRef('JsonType')], list[ForwardRef('JsonType')], str, int, float, bool, NoneType]], list[Union[dict[str, ForwardRef('JsonType')], list[ForwardRef('JsonType')], str, int, float, bool, NoneType]], str, int, float, bool, NoneType]]]) -> None: View Source

83    def log_message(self, level: Level, message_provider: Callable[[], LogMessage]) -> None:
84        if _is_severe_enough(self._log_level, level):
85            self.emit_message(
86                AirbyteMessage(
87                    type=Type.LOG,
88                    log=AirbyteLogMessage(
89                        level=level, message=filter_secrets(json.dumps(message_provider()))
90                    ),
91                )
92            )

Computing messages can be resource consuming. This method is specialized for logging because we want to allow for lazy evaluation if the log level is less severe than what is configured

def consume_queue(self) -> Iterable[AirbyteMessage]: View Source

94    def consume_queue(self) -> Iterable[AirbyteMessage]:
95        while self._message_queue:
96            yield self._message_queue.popleft()

class MessageRepository(abc.ABC): View Source

46class MessageRepository(ABC):
47    @abstractmethod
48    def emit_message(self, message: AirbyteMessage) -> None:
49        raise NotImplementedError()
50
51    @abstractmethod
52    def log_message(self, level: Level, message_provider: Callable[[], LogMessage]) -> None:
53        """
54        Computing messages can be resource consuming. This method is specialized for logging because we want to allow for lazy evaluation if
55        the log level is less severe than what is configured
56        """
57        raise NotImplementedError()
58
59    @abstractmethod
60    def consume_queue(self) -> Iterable[AirbyteMessage]:
61        raise NotImplementedError()

Helper class that provides a standard way to create an ABC using inheritance.

@abstractmethod

def emit_message( self, message: AirbyteMessage) -> None: View Source

47    @abstractmethod
48    def emit_message(self, message: AirbyteMessage) -> None:
49        raise NotImplementedError()

@abstractmethod

51    @abstractmethod
52    def log_message(self, level: Level, message_provider: Callable[[], LogMessage]) -> None:
53        """
54        Computing messages can be resource consuming. This method is specialized for logging because we want to allow for lazy evaluation if
55        the log level is less severe than what is configured
56        """
57        raise NotImplementedError()

Computing messages can be resource consuming. This method is specialized for logging because we want to allow for lazy evaluation if the log level is less severe than what is configured

@abstractmethod

def consume_queue(self) -> Iterable[AirbyteMessage]: View Source

59    @abstractmethod
60    def consume_queue(self) -> Iterable[AirbyteMessage]:
61        raise NotImplementedError()

class ConnectorStateManager: View Source

 33class ConnectorStateManager:
 34    """
 35    ConnectorStateManager consolidates the various forms of a stream's incoming state message (STREAM / GLOBAL) under a common
 36    interface. It also provides methods to extract and update state
 37    """
 38
 39    def __init__(self, state: Optional[List[AirbyteStateMessage]] = None):
 40        shared_state, per_stream_states = self._extract_from_state_message(state)
 41
 42        # We explicitly throw an error if we receive a GLOBAL state message that contains a shared_state because API sources are
 43        # designed to checkpoint state independently of one another. API sources should never be emitting a state message where
 44        # shared_state is populated. Rather than define how to handle shared_state without a clear use case, we're opting to throw an
 45        # error instead and if/when we find one, we will then implement processing of the shared_state value.
 46        if shared_state:
 47            raise ValueError(
 48                "Received a GLOBAL AirbyteStateMessage that contains a shared_state. This library only ever generates per-STREAM "
 49                "STATE messages so this was not generated by this connector. This must be an orchestrator or platform error. GLOBAL "
 50                "state messages with shared_state will not be processed correctly. "
 51            )
 52        self.per_stream_states = per_stream_states
 53
 54    def get_stream_state(
 55        self, stream_name: str, namespace: Optional[str]
 56    ) -> MutableMapping[str, Any]:
 57        """
 58        Retrieves the state of a given stream based on its descriptor (name + namespace).
 59        :param stream_name: Name of the stream being fetched
 60        :param namespace: Namespace of the stream being fetched
 61        :return: The per-stream state for a stream
 62        """
 63        stream_state: AirbyteStateBlob | None = self.per_stream_states.get(
 64            HashableStreamDescriptor(name=stream_name, namespace=namespace)
 65        )
 66        if stream_state:
 67            return copy.deepcopy({k: v for k, v in stream_state.__dict__.items()})
 68        return {}
 69
 70    def update_state_for_stream(
 71        self, stream_name: str, namespace: Optional[str], value: Mapping[str, Any]
 72    ) -> None:
 73        """
 74        Overwrites the state blob of a specific stream based on the provided stream name and optional namespace
 75        :param stream_name: The name of the stream whose state is being updated
 76        :param namespace: The namespace of the stream if it exists
 77        :param value: A stream state mapping that is being updated for a stream
 78        """
 79        stream_descriptor = HashableStreamDescriptor(name=stream_name, namespace=namespace)
 80        self.per_stream_states[stream_descriptor] = AirbyteStateBlob(value)
 81
 82    def create_state_message(self, stream_name: str, namespace: Optional[str]) -> AirbyteMessage:
 83        """
 84        Generates an AirbyteMessage using the current per-stream state of a specified stream
 85        :param stream_name: The name of the stream for the message that is being created
 86        :param namespace: The namespace of the stream for the message that is being created
 87        :return: The Airbyte state message to be emitted by the connector during a sync
 88        """
 89        hashable_descriptor = HashableStreamDescriptor(name=stream_name, namespace=namespace)
 90        stream_state = self.per_stream_states.get(hashable_descriptor) or AirbyteStateBlob()
 91
 92        return AirbyteMessage(
 93            type=MessageType.STATE,
 94            state=AirbyteStateMessage(
 95                type=AirbyteStateType.STREAM,
 96                stream=AirbyteStreamState(
 97                    stream_descriptor=StreamDescriptor(name=stream_name, namespace=namespace),
 98                    stream_state=stream_state,
 99                ),
100            ),
101        )
102
103    @classmethod
104    def _extract_from_state_message(
105        cls,
106        state: Optional[List[AirbyteStateMessage]],
107    ) -> Tuple[
108        Optional[AirbyteStateBlob],
109        MutableMapping[HashableStreamDescriptor, Optional[AirbyteStateBlob]],
110    ]:
111        """
112        Takes an incoming list of state messages or a global state message and extracts state attributes according to
113        type which can then be assigned to the new state manager being instantiated
114        :param state: The incoming state input
115        :return: A tuple of shared state and per stream state assembled from the incoming state list
116        """
117        if state is None:
118            return None, {}
119
120        is_global = cls._is_global_state(state)
121
122        if is_global:
123            # We already validate that this is a global state message, not None:
124            global_state = cast(AirbyteGlobalState, state[0].global_)
125            # global_state has shared_state, also not None:
126            shared_state: AirbyteStateBlob = cast(
127                AirbyteStateBlob, copy.deepcopy(global_state.shared_state, {})
128            )
129            streams = {
130                HashableStreamDescriptor(
131                    name=per_stream_state.stream_descriptor.name,
132                    namespace=per_stream_state.stream_descriptor.namespace,
133                ): per_stream_state.stream_state
134                for per_stream_state in global_state.stream_states  # type: ignore[union-attr] # global_state has shared_state
135            }
136            return shared_state, streams
137        else:
138            streams = {
139                HashableStreamDescriptor(
140                    name=per_stream_state.stream.stream_descriptor.name,  # type: ignore[union-attr] # stream has stream_descriptor
141                    namespace=per_stream_state.stream.stream_descriptor.namespace,  # type: ignore[union-attr] # stream has stream_descriptor
142                ): per_stream_state.stream.stream_state  # type: ignore[union-attr] # stream has stream_state
143                for per_stream_state in state
144                if per_stream_state.type == AirbyteStateType.STREAM
145                and hasattr(per_stream_state, "stream")  # type: ignore # state is always a list of AirbyteStateMessage if is_per_stream is True
146            }
147            return None, streams
148
149    @staticmethod
150    def _is_global_state(state: Union[List[AirbyteStateMessage], MutableMapping[str, Any]]) -> bool:
151        return (
152            isinstance(state, List)
153            and len(state) == 1
154            and isinstance(state[0], AirbyteStateMessage)
155            and state[0].type == AirbyteStateType.GLOBAL
156        )
157
158    @staticmethod
159    def _is_per_stream_state(
160        state: Union[List[AirbyteStateMessage], MutableMapping[str, Any]],
161    ) -> bool:
162        return isinstance(state, List)

ConnectorStateManager consolidates the various forms of a stream's incoming state message (STREAM / GLOBAL) under a common interface. It also provides methods to extract and update state

ConnectorStateManager( state: Optional[List[airbyte_cdk.models.airbyte_protocol.AirbyteStateMessage]] = None) View Source

39    def __init__(self, state: Optional[List[AirbyteStateMessage]] = None):
40        shared_state, per_stream_states = self._extract_from_state_message(state)
41
42        # We explicitly throw an error if we receive a GLOBAL state message that contains a shared_state because API sources are
43        # designed to checkpoint state independently of one another. API sources should never be emitting a state message where
44        # shared_state is populated. Rather than define how to handle shared_state without a clear use case, we're opting to throw an
45        # error instead and if/when we find one, we will then implement processing of the shared_state value.
46        if shared_state:
47            raise ValueError(
48                "Received a GLOBAL AirbyteStateMessage that contains a shared_state. This library only ever generates per-STREAM "
49                "STATE messages so this was not generated by this connector. This must be an orchestrator or platform error. GLOBAL "
50                "state messages with shared_state will not be processed correctly. "
51            )
52        self.per_stream_states = per_stream_states

per_stream_states

def get_stream_state( self, stream_name: str, namespace: Optional[str]) -> MutableMapping[str, Any]: View Source

54    def get_stream_state(
55        self, stream_name: str, namespace: Optional[str]
56    ) -> MutableMapping[str, Any]:
57        """
58        Retrieves the state of a given stream based on its descriptor (name + namespace).
59        :param stream_name: Name of the stream being fetched
60        :param namespace: Namespace of the stream being fetched
61        :return: The per-stream state for a stream
62        """
63        stream_state: AirbyteStateBlob | None = self.per_stream_states.get(
64            HashableStreamDescriptor(name=stream_name, namespace=namespace)
65        )
66        if stream_state:
67            return copy.deepcopy({k: v for k, v in stream_state.__dict__.items()})
68        return {}

Retrieves the state of a given stream based on its descriptor (name + namespace).

Parameters

stream_name: Name of the stream being fetched
namespace: Namespace of the stream being fetched

Returns

The per-stream state for a stream

def update_state_for_stream( self, stream_name: str, namespace: Optional[str], value: Mapping[str, Any]) -> None: View Source

70    def update_state_for_stream(
71        self, stream_name: str, namespace: Optional[str], value: Mapping[str, Any]
72    ) -> None:
73        """
74        Overwrites the state blob of a specific stream based on the provided stream name and optional namespace
75        :param stream_name: The name of the stream whose state is being updated
76        :param namespace: The namespace of the stream if it exists
77        :param value: A stream state mapping that is being updated for a stream
78        """
79        stream_descriptor = HashableStreamDescriptor(name=stream_name, namespace=namespace)
80        self.per_stream_states[stream_descriptor] = AirbyteStateBlob(value)

Overwrites the state blob of a specific stream based on the provided stream name and optional namespace

Parameters

stream_name: The name of the stream whose state is being updated
namespace: The namespace of the stream if it exists
value: A stream state mapping that is being updated for a stream

def create_state_message( self, stream_name: str, namespace: Optional[str]) -> AirbyteMessage: View Source

 82    def create_state_message(self, stream_name: str, namespace: Optional[str]) -> AirbyteMessage:
 83        """
 84        Generates an AirbyteMessage using the current per-stream state of a specified stream
 85        :param stream_name: The name of the stream for the message that is being created
 86        :param namespace: The namespace of the stream for the message that is being created
 87        :return: The Airbyte state message to be emitted by the connector during a sync
 88        """
 89        hashable_descriptor = HashableStreamDescriptor(name=stream_name, namespace=namespace)
 90        stream_state = self.per_stream_states.get(hashable_descriptor) or AirbyteStateBlob()
 91
 92        return AirbyteMessage(
 93            type=MessageType.STATE,
 94            state=AirbyteStateMessage(
 95                type=AirbyteStateType.STREAM,
 96                stream=AirbyteStreamState(
 97                    stream_descriptor=StreamDescriptor(name=stream_name, namespace=namespace),
 98                    stream_state=stream_state,
 99                ),
100            ),
101        )

Generates an AirbyteMessage using the current per-stream state of a specified stream

Parameters

stream_name: The name of the stream for the message that is being created
namespace: The namespace of the stream for the message that is being created

Returns

The Airbyte state message to be emitted by the connector during a sync

@deprecated('Deprecated as of CDK version 0.87.0. Deprecated in favor of the `CheckpointMixin` which offers similar functionality.')

class IncrementalMixin(airbyte_cdk.sources.streams.core.CheckpointMixin, abc.ABC): View Source

 95@deprecated(
 96    "Deprecated as of CDK version 0.87.0. "
 97    "Deprecated in favor of the `CheckpointMixin` which offers similar functionality."
 98)
 99class IncrementalMixin(CheckpointMixin, ABC):
100    """Mixin to make stream incremental.
101
102    class IncrementalStream(Stream, IncrementalMixin):
103        @property
104        def state(self):
105            return self._state
106
107        @state.setter
108        def state(self, value):
109            self._state[self.cursor_field] = value[self.cursor_field]
110    """

Mixin to make stream incremental.

class IncrementalStream(Stream, IncrementalMixin): @property def state(self): return self._state

@state.setter
def state(self, value):
    self._state[self.cursor_field] = value[self.cursor_field]

Inherited Members

airbyte_cdk.sources.streams.core.CheckpointMixin: state

StreamData

def package_name_from_class(cls: object) -> str: View Source

52def package_name_from_class(cls: object) -> str:
53    """Find the package name given a class name"""
54    module = inspect.getmodule(cls)
55    if module is not None:
56        return module.__name__.split(".")[0]
57    else:
58        raise ValueError(f"Could not find package name for class {cls}")

Find the package name given a class name

class AirbyteTracedException(builtins.Exception): View Source

 26class AirbyteTracedException(Exception):
 27    """
 28    An exception that should be emitted as an AirbyteTraceMessage
 29    """
 30
 31    def __init__(
 32        self,
 33        internal_message: Optional[str] = None,
 34        message: Optional[str] = None,
 35        failure_type: FailureType = FailureType.system_error,
 36        exception: Optional[BaseException] = None,
 37        stream_descriptor: Optional[StreamDescriptor] = None,
 38    ):
 39        """
 40        :param internal_message: the internal error that caused the failure
 41        :param message: a user-friendly message that indicates the cause of the error
 42        :param failure_type: the type of error
 43        :param exception: the exception that caused the error, from which the stack trace should be retrieved
 44        :param stream_descriptor: describe the stream from which the exception comes from
 45        """
 46        self.internal_message = internal_message
 47        self.message = message
 48        self.failure_type = failure_type
 49        self._exception = exception
 50        self._stream_descriptor = stream_descriptor
 51        super().__init__(internal_message)
 52
 53    def as_airbyte_message(
 54        self, stream_descriptor: Optional[StreamDescriptor] = None
 55    ) -> AirbyteMessage:
 56        """
 57        Builds an AirbyteTraceMessage from the exception
 58
 59        :param stream_descriptor is deprecated, please use the stream_description in `__init__ or `from_exception`. If many
 60          stream_descriptors are defined, the one from `as_airbyte_message` will be discarded.
 61        """
 62        now_millis = time.time_ns() // 1_000_000
 63
 64        trace_exc = self._exception or self
 65        stack_trace_str = "".join(traceback.TracebackException.from_exception(trace_exc).format())
 66
 67        trace_message = AirbyteTraceMessage(
 68            type=TraceType.ERROR,
 69            emitted_at=now_millis,
 70            error=AirbyteErrorTraceMessage(
 71                message=self.message
 72                or "Something went wrong in the connector. See the logs for more details.",
 73                internal_message=self.internal_message,
 74                failure_type=self.failure_type,
 75                stack_trace=stack_trace_str,
 76                stream_descriptor=self._stream_descriptor
 77                if self._stream_descriptor is not None
 78                else stream_descriptor,
 79            ),
 80        )
 81
 82        return AirbyteMessage(type=MessageType.TRACE, trace=trace_message)
 83
 84    def as_connection_status_message(self) -> Optional[AirbyteMessage]:
 85        if self.failure_type == FailureType.config_error:
 86            return AirbyteMessage(
 87                type=MessageType.CONNECTION_STATUS,
 88                connectionStatus=AirbyteConnectionStatus(
 89                    status=Status.FAILED, message=self.message
 90                ),
 91            )
 92        return None
 93
 94    def emit_message(self) -> None:
 95        """
 96        Prints the exception as an AirbyteTraceMessage.
 97        Note that this will be called automatically on uncaught exceptions when using the airbyte_cdk entrypoint.
 98        """
 99        message = orjson.dumps(AirbyteMessageSerializer.dump(self.as_airbyte_message())).decode()
100        filtered_message = filter_secrets(message)
101        print(filtered_message)
102
103    @classmethod
104    def from_exception(
105        cls,
106        exc: BaseException,
107        stream_descriptor: Optional[StreamDescriptor] = None,
108        *args: Any,
109        **kwargs: Any,
110    ) -> "AirbyteTracedException":
111        """
112        Helper to create an AirbyteTracedException from an existing exception
113        :param exc: the exception that caused the error
114        :param stream_descriptor: describe the stream from which the exception comes from
115        """
116        return cls(
117            internal_message=str(exc),
118            exception=exc,
119            stream_descriptor=stream_descriptor,
120            *args,
121            **kwargs,
122        )  # type: ignore  # ignoring because of args and kwargs
123
124    def as_sanitized_airbyte_message(
125        self, stream_descriptor: Optional[StreamDescriptor] = None
126    ) -> AirbyteMessage:
127        """
128        Builds an AirbyteTraceMessage from the exception and sanitizes any secrets from the message body
129
130        :param stream_descriptor is deprecated, please use the stream_description in `__init__ or `from_exception`. If many
131          stream_descriptors are defined, the one from `as_sanitized_airbyte_message` will be discarded.
132        """
133        error_message = self.as_airbyte_message(stream_descriptor=stream_descriptor)
134        if error_message.trace.error.message:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
135            error_message.trace.error.message = filter_secrets(  # type: ignore[union-attr]
136                error_message.trace.error.message,  # type: ignore[union-attr]
137            )
138        if error_message.trace.error.internal_message:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
139            error_message.trace.error.internal_message = filter_secrets(  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
140                error_message.trace.error.internal_message  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
141            )
142        if error_message.trace.error.stack_trace:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
143            error_message.trace.error.stack_trace = filter_secrets(  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
144                error_message.trace.error.stack_trace  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
145            )
146        return error_message

An exception that should be emitted as an AirbyteTraceMessage

AirbyteTracedException( internal_message: Optional[str] = None, message: Optional[str] = None, failure_type: airbyte_protocol_dataclasses.models.airbyte_protocol.FailureType = <FailureType.system_error: 'system_error'>, exception: Optional[BaseException] = None, stream_descriptor: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.StreamDescriptor] = None) View Source

31    def __init__(
32        self,
33        internal_message: Optional[str] = None,
34        message: Optional[str] = None,
35        failure_type: FailureType = FailureType.system_error,
36        exception: Optional[BaseException] = None,
37        stream_descriptor: Optional[StreamDescriptor] = None,
38    ):
39        """
40        :param internal_message: the internal error that caused the failure
41        :param message: a user-friendly message that indicates the cause of the error
42        :param failure_type: the type of error
43        :param exception: the exception that caused the error, from which the stack trace should be retrieved
44        :param stream_descriptor: describe the stream from which the exception comes from
45        """
46        self.internal_message = internal_message
47        self.message = message
48        self.failure_type = failure_type
49        self._exception = exception
50        self._stream_descriptor = stream_descriptor
51        super().__init__(internal_message)

Parameters

internal_message: the internal error that caused the failure
message: a user-friendly message that indicates the cause of the error
failure_type: the type of error
exception: the exception that caused the error, from which the stack trace should be retrieved
stream_descriptor: describe the stream from which the exception comes from

internal_message

message

failure_type

def as_airbyte_message( self, stream_descriptor: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.StreamDescriptor] = None) -> AirbyteMessage: View Source

53    def as_airbyte_message(
54        self, stream_descriptor: Optional[StreamDescriptor] = None
55    ) -> AirbyteMessage:
56        """
57        Builds an AirbyteTraceMessage from the exception
58
59        :param stream_descriptor is deprecated, please use the stream_description in `__init__ or `from_exception`. If many
60          stream_descriptors are defined, the one from `as_airbyte_message` will be discarded.
61        """
62        now_millis = time.time_ns() // 1_000_000
63
64        trace_exc = self._exception or self
65        stack_trace_str = "".join(traceback.TracebackException.from_exception(trace_exc).format())
66
67        trace_message = AirbyteTraceMessage(
68            type=TraceType.ERROR,
69            emitted_at=now_millis,
70            error=AirbyteErrorTraceMessage(
71                message=self.message
72                or "Something went wrong in the connector. See the logs for more details.",
73                internal_message=self.internal_message,
74                failure_type=self.failure_type,
75                stack_trace=stack_trace_str,
76                stream_descriptor=self._stream_descriptor
77                if self._stream_descriptor is not None
78                else stream_descriptor,
79            ),
80        )
81
82        return AirbyteMessage(type=MessageType.TRACE, trace=trace_message)

Builds an AirbyteTraceMessage from the exception

:param stream_descriptor is deprecated, please use the stream_description in __init__ orfrom_exception. If many stream_descriptors are defined, the one fromas_airbyte_message` will be discarded.

def as_connection_status_message(self) -> Optional[AirbyteMessage]: View Source

84    def as_connection_status_message(self) -> Optional[AirbyteMessage]:
85        if self.failure_type == FailureType.config_error:
86            return AirbyteMessage(
87                type=MessageType.CONNECTION_STATUS,
88                connectionStatus=AirbyteConnectionStatus(
89                    status=Status.FAILED, message=self.message
90                ),
91            )
92        return None

def emit_message(self) -> None: View Source

 94    def emit_message(self) -> None:
 95        """
 96        Prints the exception as an AirbyteTraceMessage.
 97        Note that this will be called automatically on uncaught exceptions when using the airbyte_cdk entrypoint.
 98        """
 99        message = orjson.dumps(AirbyteMessageSerializer.dump(self.as_airbyte_message())).decode()
100        filtered_message = filter_secrets(message)
101        print(filtered_message)

Prints the exception as an AirbyteTraceMessage. Note that this will be called automatically on uncaught exceptions when using the airbyte_cdk entrypoint.

@classmethod

def from_exception( cls, exc: BaseException, stream_descriptor: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.StreamDescriptor] = None, *args: Any, **kwargs: Any) -> AirbyteTracedException: View Source

103    @classmethod
104    def from_exception(
105        cls,
106        exc: BaseException,
107        stream_descriptor: Optional[StreamDescriptor] = None,
108        *args: Any,
109        **kwargs: Any,
110    ) -> "AirbyteTracedException":
111        """
112        Helper to create an AirbyteTracedException from an existing exception
113        :param exc: the exception that caused the error
114        :param stream_descriptor: describe the stream from which the exception comes from
115        """
116        return cls(
117            internal_message=str(exc),
118            exception=exc,
119            stream_descriptor=stream_descriptor,
120            *args,
121            **kwargs,
122        )  # type: ignore  # ignoring because of args and kwargs

Helper to create an AirbyteTracedException from an existing exception

Parameters

exc: the exception that caused the error
stream_descriptor: describe the stream from which the exception comes from

def as_sanitized_airbyte_message( self, stream_descriptor: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.StreamDescriptor] = None) -> AirbyteMessage: View Source

124    def as_sanitized_airbyte_message(
125        self, stream_descriptor: Optional[StreamDescriptor] = None
126    ) -> AirbyteMessage:
127        """
128        Builds an AirbyteTraceMessage from the exception and sanitizes any secrets from the message body
129
130        :param stream_descriptor is deprecated, please use the stream_description in `__init__ or `from_exception`. If many
131          stream_descriptors are defined, the one from `as_sanitized_airbyte_message` will be discarded.
132        """
133        error_message = self.as_airbyte_message(stream_descriptor=stream_descriptor)
134        if error_message.trace.error.message:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
135            error_message.trace.error.message = filter_secrets(  # type: ignore[union-attr]
136                error_message.trace.error.message,  # type: ignore[union-attr]
137            )
138        if error_message.trace.error.internal_message:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
139            error_message.trace.error.internal_message = filter_secrets(  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
140                error_message.trace.error.internal_message  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
141            )
142        if error_message.trace.error.stack_trace:  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
143            error_message.trace.error.stack_trace = filter_secrets(  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
144                error_message.trace.error.stack_trace  # type: ignore[union-attr] # AirbyteMessage with MessageType.TRACE has AirbyteTraceMessage
145            )
146        return error_message

Builds an AirbyteTraceMessage from the exception and sanitizes any secrets from the message body

:param stream_descriptor is deprecated, please use the stream_description in __init__ orfrom_exception. If many stream_descriptors are defined, the one fromas_sanitized_airbyte_message` will be discarded.

def is_cloud_environment() -> bool: View Source

11def is_cloud_environment() -> bool:
12    """
13    Returns True if the connector is running in a cloud environment, False otherwise.
14
15    The function checks the value of the DEPLOYMENT_MODE environment variable which is set by the platform.
16    This function can be used to determine whether stricter security measures should be applied.
17    """
18    deployment_mode = os.environ.get("DEPLOYMENT_MODE", "")
19    return deployment_mode.casefold() == CLOUD_DEPLOYMENT_MODE

Returns True if the connector is running in a cloud environment, False otherwise.

The function checks the value of the DEPLOYMENT_MODE environment variable which is set by the platform. This function can be used to determine whether stricter security measures should be applied.

class InternalConfig(pydantic.v1.main.BaseModel): View Source

190class InternalConfig(BaseModel):
191    KEYWORDS: ClassVar[set[str]] = {"_limit", "_page_size"}
192    limit: int = Field(None, alias="_limit")
193    page_size: int = Field(None, alias="_page_size")
194
195    def dict(self, *args: Any, **kwargs: Any) -> dict[str, Any]:
196        kwargs["by_alias"] = True
197        kwargs["exclude_unset"] = True
198        return super().dict(*args, **kwargs)
199
200    def is_limit_reached(self, records_counter: int) -> bool:
201        """
202        Check if record count reached limit set by internal config.
203        :param records_counter - number of records already red
204        :return True if limit reached, False otherwise
205        """
206        if self.limit:
207            if records_counter >= self.limit:
208                return True
209        return False

KEYWORDS: ClassVar[set[str]] = {'_limit', '_page_size'}

limit: int

page_size: int

def dict(self, *args: Any, **kwargs: Any) -> dict[str, typing.Any]: View Source

195    def dict(self, *args: Any, **kwargs: Any) -> dict[str, Any]:
196        kwargs["by_alias"] = True
197        kwargs["exclude_unset"] = True
198        return super().dict(*args, **kwargs)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

def is_limit_reached(self, records_counter: int) -> bool: View Source

200    def is_limit_reached(self, records_counter: int) -> bool:
201        """
202        Check if record count reached limit set by internal config.
203        :param records_counter - number of records already red
204        :return True if limit reached, False otherwise
205        """
206        if self.limit:
207            if records_counter >= self.limit:
208                return True
209        return False

Check if record count reached limit set by internal config. :param records_counter - number of records already red :return True if limit reached, False otherwise

class ResourceSchemaLoader: View Source

116class ResourceSchemaLoader:
117    """JSONSchema loader from package resources"""
118
119    def __init__(self, package_name: str):
120        self.package_name = package_name
121
122    def get_schema(self, name: str) -> dict[str, Any]:
123        """
124        This method retrieves a JSON schema from the schemas/ folder.
125
126
127        The expected file structure is to have all top-level schemas (corresponding to streams) in the "schemas/" folder, with any shared $refs
128        living inside the "schemas/shared/" folder. For example:
129
130        schemas/shared/<shared_definition>.json
131        schemas/<name>.json # contains a $ref to shared_definition
132        schemas/<name2>.json # contains a $ref to shared_definition
133        """
134
135        schema_filename = f"schemas/{name}.json"
136        raw_file = pkgutil.get_data(self.package_name, schema_filename)
137        if not raw_file:
138            raise IOError(f"Cannot find file {schema_filename}")
139        try:
140            raw_schema = json.loads(raw_file)
141        except ValueError as err:
142            raise RuntimeError(f"Invalid JSON file format for file {schema_filename}") from err
143
144        return self._resolve_schema_references(raw_schema)
145
146    def _resolve_schema_references(self, raw_schema: dict[str, Any]) -> dict[str, Any]:
147        """
148        Resolve links to external references and move it to local "definitions" map.
149
150        :param raw_schema jsonschema to lookup for external links.
151        :return JSON serializable object with references without external dependencies.
152        """
153
154        package = importlib.import_module(self.package_name)
155        if package.__file__:
156            base = os.path.dirname(package.__file__) + "/"
157        else:
158            raise ValueError(f"Package {package} does not have a valid __file__ field")
159        resolved = jsonref.JsonRef.replace_refs(
160            raw_schema, loader=JsonFileLoader(base, "schemas/shared"), base_uri=base
161        )
162        resolved = resolve_ref_links(resolved)
163        if isinstance(resolved, dict):
164            return resolved
165        else:
166            raise ValueError(f"Expected resolved to be a dict. Got {resolved}")

JSONSchema loader from package resources

ResourceSchemaLoader(package_name: str) View Source

119    def __init__(self, package_name: str):
120        self.package_name = package_name

package_name

def get_schema(self, name: str) -> dict[str, typing.Any]: View Source

122    def get_schema(self, name: str) -> dict[str, Any]:
123        """
124        This method retrieves a JSON schema from the schemas/ folder.
125
126
127        The expected file structure is to have all top-level schemas (corresponding to streams) in the "schemas/" folder, with any shared $refs
128        living inside the "schemas/shared/" folder. For example:
129
130        schemas/shared/<shared_definition>.json
131        schemas/<name>.json # contains a $ref to shared_definition
132        schemas/<name2>.json # contains a $ref to shared_definition
133        """
134
135        schema_filename = f"schemas/{name}.json"
136        raw_file = pkgutil.get_data(self.package_name, schema_filename)
137        if not raw_file:
138            raise IOError(f"Cannot find file {schema_filename}")
139        try:
140            raw_schema = json.loads(raw_file)
141        except ValueError as err:
142            raise RuntimeError(f"Invalid JSON file format for file {schema_filename}") from err
143
144        return self._resolve_schema_references(raw_schema)

This method retrieves a JSON schema from the schemas/ folder.

The expected file structure is to have all top-level schemas (corresponding to streams) in the "schemas/" folder, with any shared $refs living inside the "schemas/shared/" folder. For example:

schemas/shared/.json schemas/.json # contains a $ref to shared_definition schemas/.json # contains a $ref to shared_definition

def check_config_against_spec_or_exit( config: Mapping[str, Any], spec: airbyte_protocol_dataclasses.models.airbyte_protocol.ConnectorSpecification) -> None: View Source

169def check_config_against_spec_or_exit(
170    config: Mapping[str, Any], spec: ConnectorSpecification
171) -> None:
172    """
173    Check config object against spec. In case of spec is invalid, throws
174    an exception with validation error description.
175
176    :param config - config loaded from file specified over command line
177    :param spec - spec object generated by connector
178    """
179    spec_schema = spec.connectionSpecification
180    try:
181        validate(instance=config, schema=spec_schema)
182    except ValidationError as validation_error:
183        raise AirbyteTracedException(
184            message="Config validation error: " + validation_error.message,
185            internal_message=validation_error.message,
186            failure_type=FailureType.config_error,
187        ) from None  # required to prevent logging config secrets from the ValidationError's stacktrace

Check config object against spec. In case of spec is invalid, throws an exception with validation error description.

:param config - config loaded from file specified over command line :param spec - spec object generated by connector

def split_config( config: Mapping[str, Any]) -> Tuple[dict[str, Any], InternalConfig]: View Source

212def split_config(config: Mapping[str, Any]) -> Tuple[dict[str, Any], InternalConfig]:
213    """
214    Break config map object into 2 instances: first is a dict with user defined
215    configuration and second is internal config that contains private keys for
216    acceptance test configuration.
217
218    :param
219     config - Dict object that has been loaded from config file.
220
221    :return tuple of user defined config dict with filtered out internal
222    parameters and connector acceptance test internal config object.
223    """
224    main_config = {}
225    internal_config = {}
226    for k, v in config.items():
227        if k in InternalConfig.KEYWORDS:
228            internal_config[k] = v
229        else:
230            main_config[k] = v
231    return main_config, InternalConfig.parse_obj(internal_config)

Break config map object into 2 instances: first is a dict with user defined configuration and second is internal config that contains private keys for acceptance test configuration.

:param config - Dict object that has been loaded from config file.

:return tuple of user defined config dict with filtered out internal parameters and connector acceptance test internal config object.

class TransformConfig(enum.Flag): View Source

48class TransformConfig(Flag):
49    """
50    TypeTransformer class config. Configs can be combined using bitwise or operator e.g.
51        ```
52        TransformConfig.DefaultSchemaNormalization | TransformConfig.CustomSchemaNormalization
53        ```
54    """
55
56    # No action taken, default behavior. Cannot be combined with any other options.
57    NoTransform = auto()
58    # Applies default type casting with default_convert method which converts
59    # values by applying simple type casting to specified jsonschema type.
60    DefaultSchemaNormalization = auto()
61    # Allow registering custom type transformation callback. Can be combined
62    # with DefaultSchemaNormalization. In this case default type casting would
63    # be applied before custom one.
64    CustomSchemaNormalization = auto()

TypeTransformer class config. Configs can be combined using bitwise or operator e.g.

TransformConfig.DefaultSchemaNormalization | TransformConfig.CustomSchemaNormalization

NoTransform = <TransformConfig.NoTransform: 1>

DefaultSchemaNormalization = <TransformConfig.DefaultSchemaNormalization: 2>

CustomSchemaNormalization = <TransformConfig.CustomSchemaNormalization: 4>

class TypeTransformer: View Source

 67class TypeTransformer:
 68    """
 69    Class for transforming object before output.
 70    """
 71
 72    _custom_normalizer: Optional[Callable[[Any, Dict[str, Any]], Any]] = None
 73
 74    def __init__(self, config: TransformConfig):
 75        """
 76        Initialize TypeTransformer instance.
 77        :param config Transform config that would be applied to object
 78        """
 79        if TransformConfig.NoTransform in config and config != TransformConfig.NoTransform:
 80            raise Exception("NoTransform option cannot be combined with other flags.")
 81        self._config = config
 82        all_validators = {
 83            key: self.__get_normalizer(key, orig_validator)
 84            for key, orig_validator in Draft7Validator.VALIDATORS.items()
 85            # Do not validate field we do not transform for maximum performance.
 86            if key in ["type", "array", "$ref", "properties", "items"]
 87        }
 88        self._normalizer = validators.create(
 89            meta_schema=Draft7Validator.META_SCHEMA, validators=all_validators
 90        )
 91
 92    def registerCustomTransform(
 93        self, normalization_callback: Callable[[Any, dict[str, Any]], Any]
 94    ) -> Callable[[Any, dict[str, Any]], Any]:
 95        """
 96        Register custom normalization callback.
 97        :param normalization_callback function to be used for value
 98        normalization. Takes original value and part type schema. Should return
 99        normalized value. See docs/connector-development/cdk-python/schemas.md
100        for details.
101        :return Same callback, this is useful for using registerCustomTransform function as decorator.
102        """
103        if TransformConfig.CustomSchemaNormalization not in self._config:
104            raise Exception(
105                "Please set TransformConfig.CustomSchemaNormalization config before registering custom normalizer"
106            )
107        self._custom_normalizer = normalization_callback
108        return normalization_callback
109
110    def __normalize(self, original_item: Any, subschema: Dict[str, Any]) -> Any:
111        """
112        Applies different transform function to object's field according to config.
113        :param original_item original value of field.
114        :param subschema part of the jsonschema containing field type/format data.
115        :return Final field value.
116        """
117        if TransformConfig.DefaultSchemaNormalization in self._config:
118            original_item = self.default_convert(original_item, subschema)
119
120        if self._custom_normalizer:
121            original_item = self._custom_normalizer(original_item, subschema)
122        return original_item
123
124    @staticmethod
125    def default_convert(original_item: Any, subschema: Dict[str, Any]) -> Any:
126        """
127        Default transform function that is used when TransformConfig.DefaultSchemaNormalization flag set.
128        :param original_item original value of field.
129        :param subschema part of the jsonschema containing field type/format data.
130        :return transformed field value.
131        """
132        target_type = subschema.get("type", [])
133        if original_item is None and "null" in target_type:
134            return None
135        if isinstance(target_type, list):
136            # jsonschema type could either be a single string or array of type
137            # strings. In case if there is some disambigous and more than one
138            # type (except null) do not do any conversion and return original
139            # value. If type array has one type and null i.e. {"type":
140            # ["integer", "null"]}, convert value to specified type.
141            target_type = [t for t in target_type if t != "null"]
142            if len(target_type) != 1:
143                return original_item
144            target_type = target_type[0]
145        try:
146            if target_type == "string":
147                return str(original_item)
148            elif target_type == "number":
149                return float(original_item)
150            elif target_type == "integer":
151                return int(original_item)
152            elif target_type == "boolean":
153                if isinstance(original_item, str):
154                    return _strtobool(original_item) == 1
155                return bool(original_item)
156            elif target_type == "array":
157                item_types = set(subschema.get("items", {}).get("type", set()))
158                if (
159                    item_types.issubset(json_to_python_simple)
160                    and type(original_item) in json_to_python_simple.values()
161                ):
162                    return [original_item]
163        except (ValueError, TypeError):
164            return original_item
165        return original_item
166
167    def __get_normalizer(
168        self,
169        schema_key: str,
170        original_validator: Callable,  # type: ignore[type-arg]
171    ) -> Callable[[Any, Any, Any, dict[str, Any]], Generator[Any, Any, None]]:
172        """
173        Traverse through object fields using native jsonschema validator and apply normalization function.
174        :param schema_key related json schema key that currently being validated/normalized.
175        :original_validator: native jsonschema validator callback.
176        """
177
178        def normalizator(
179            validator_instance: Validator,
180            property_value: Any,
181            instance: Any,
182            schema: Dict[str, Any],
183        ) -> Generator[Any, Any, None]:
184            """
185            Jsonschema validator callable it uses for validating instance. We
186            override default Draft7Validator to perform value transformation
187            before validation take place. We do not take any action except
188            logging warn if object does not conform to json schema, just using
189            jsonschema algorithm to traverse through object fields.
190            Look
191            https://python-jsonschema.readthedocs.io/en/stable/creating/?highlight=validators.create#jsonschema.validators.create
192            validators parameter for detailed description.
193            :
194            """
195
196            def resolve(subschema: dict[str, Any]) -> dict[str, Any]:
197                if "$ref" in subschema:
198                    _, resolved = cast(
199                        RefResolver,
200                        validator_instance.resolver,
201                    ).resolve(subschema["$ref"])
202                    return cast(dict[str, Any], resolved)
203                return subschema
204
205            # Transform object and array values before running json schema type checking for each element.
206            # Recursively normalize every value of the "instance" sub-object,
207            # if "instance" is an incorrect type - skip recursive normalization of "instance"
208            if schema_key == "properties" and isinstance(instance, dict):
209                for k, subschema in property_value.items():
210                    if k in instance:
211                        subschema = resolve(subschema)
212                        instance[k] = self.__normalize(instance[k], subschema)
213            # Recursively normalize every item of the "instance" sub-array,
214            # if "instance" is an incorrect type - skip recursive normalization of "instance"
215            elif schema_key == "items" and isinstance(instance, list):
216                subschema = resolve(property_value)
217                for index, item in enumerate(instance):
218                    instance[index] = self.__normalize(item, subschema)
219
220            # Running native jsonschema traverse algorithm after field normalization is done.
221            yield from original_validator(
222                validator_instance,
223                property_value,
224                instance,
225                schema,
226            )
227
228        return normalizator
229
230    def transform(
231        self,
232        record: Dict[str, Any],
233        schema: Mapping[str, Any],
234    ) -> None:
235        """
236        Normalize and validate according to config.
237        :param record: record instance for normalization/transformation. All modification are done by modifying existent object.
238        :param schema: object's jsonschema for normalization.
239        """
240        if TransformConfig.NoTransform in self._config:
241            return
242        normalizer = self._normalizer(schema)
243        for e in normalizer.iter_errors(record):
244            """
245            just calling normalizer.validate() would throw an exception on
246            first validation occurrences and stop processing rest of schema.
247            """
248            logger.warning(self.get_error_message(e))
249
250    def get_error_message(self, e: ValidationError) -> str:
251        """
252        Construct a sanitized error message from a ValidationError instance.
253        """
254        field_path = ".".join(map(str, e.path))
255        type_structure = self._get_type_structure(e.instance)
256
257        return f"Failed to transform value from type '{type_structure}' to type '{e.validator_value}' at path: '{field_path}'"
258
259    def _get_type_structure(self, input_data: Any, current_depth: int = 0) -> Any:
260        """
261        Get the structure of a given input data for use in error message construction.
262        """
263        # Handle null values
264        if input_data is None:
265            return "null"
266
267        # Avoid recursing too deep
268        if current_depth >= MAX_NESTING_DEPTH:
269            return "object" if isinstance(input_data, dict) else python_to_json[type(input_data)]
270
271        if isinstance(input_data, dict):
272            return {
273                key: self._get_type_structure(field_value, current_depth + 1)
274                for key, field_value in input_data.items()
275            }
276
277        else:
278            return python_to_json[type(input_data)]

Class for transforming object before output.

TypeTransformer(config: TransformConfig) View Source

74    def __init__(self, config: TransformConfig):
75        """
76        Initialize TypeTransformer instance.
77        :param config Transform config that would be applied to object
78        """
79        if TransformConfig.NoTransform in config and config != TransformConfig.NoTransform:
80            raise Exception("NoTransform option cannot be combined with other flags.")
81        self._config = config
82        all_validators = {
83            key: self.__get_normalizer(key, orig_validator)
84            for key, orig_validator in Draft7Validator.VALIDATORS.items()
85            # Do not validate field we do not transform for maximum performance.
86            if key in ["type", "array", "$ref", "properties", "items"]
87        }
88        self._normalizer = validators.create(
89            meta_schema=Draft7Validator.META_SCHEMA, validators=all_validators
90        )

Initialize TypeTransformer instance. :param config Transform config that would be applied to object

def registerCustomTransform( self, normalization_callback: Callable[[Any, dict[str, Any]], Any]) -> Callable[[Any, dict[str, Any]], Any]: View Source

 92    def registerCustomTransform(
 93        self, normalization_callback: Callable[[Any, dict[str, Any]], Any]
 94    ) -> Callable[[Any, dict[str, Any]], Any]:
 95        """
 96        Register custom normalization callback.
 97        :param normalization_callback function to be used for value
 98        normalization. Takes original value and part type schema. Should return
 99        normalized value. See docs/connector-development/cdk-python/schemas.md
100        for details.
101        :return Same callback, this is useful for using registerCustomTransform function as decorator.
102        """
103        if TransformConfig.CustomSchemaNormalization not in self._config:
104            raise Exception(
105                "Please set TransformConfig.CustomSchemaNormalization config before registering custom normalizer"
106            )
107        self._custom_normalizer = normalization_callback
108        return normalization_callback

Register custom normalization callback. :param normalization_callback function to be used for value normalization. Takes original value and part type schema. Should return normalized value. See docs/connector-development/cdk-python/schemas.md for details. :return Same callback, this is useful for using registerCustomTransform function as decorator.

@staticmethod

def default_convert(original_item: Any, subschema: Dict[str, Any]) -> Any: View Source

124    @staticmethod
125    def default_convert(original_item: Any, subschema: Dict[str, Any]) -> Any:
126        """
127        Default transform function that is used when TransformConfig.DefaultSchemaNormalization flag set.
128        :param original_item original value of field.
129        :param subschema part of the jsonschema containing field type/format data.
130        :return transformed field value.
131        """
132        target_type = subschema.get("type", [])
133        if original_item is None and "null" in target_type:
134            return None
135        if isinstance(target_type, list):
136            # jsonschema type could either be a single string or array of type
137            # strings. In case if there is some disambigous and more than one
138            # type (except null) do not do any conversion and return original
139            # value. If type array has one type and null i.e. {"type":
140            # ["integer", "null"]}, convert value to specified type.
141            target_type = [t for t in target_type if t != "null"]
142            if len(target_type) != 1:
143                return original_item
144            target_type = target_type[0]
145        try:
146            if target_type == "string":
147                return str(original_item)
148            elif target_type == "number":
149                return float(original_item)
150            elif target_type == "integer":
151                return int(original_item)
152            elif target_type == "boolean":
153                if isinstance(original_item, str):
154                    return _strtobool(original_item) == 1
155                return bool(original_item)
156            elif target_type == "array":
157                item_types = set(subschema.get("items", {}).get("type", set()))
158                if (
159                    item_types.issubset(json_to_python_simple)
160                    and type(original_item) in json_to_python_simple.values()
161                ):
162                    return [original_item]
163        except (ValueError, TypeError):
164            return original_item
165        return original_item

Default transform function that is used when TransformConfig.DefaultSchemaNormalization flag set. :param original_item original value of field. :param subschema part of the jsonschema containing field type/format data. :return transformed field value.

def transform(self, record: Dict[str, Any], schema: Mapping[str, Any]) -> None: View Source

230    def transform(
231        self,
232        record: Dict[str, Any],
233        schema: Mapping[str, Any],
234    ) -> None:
235        """
236        Normalize and validate according to config.
237        :param record: record instance for normalization/transformation. All modification are done by modifying existent object.
238        :param schema: object's jsonschema for normalization.
239        """
240        if TransformConfig.NoTransform in self._config:
241            return
242        normalizer = self._normalizer(schema)
243        for e in normalizer.iter_errors(record):
244            """
245            just calling normalizer.validate() would throw an exception on
246            first validation occurrences and stop processing rest of schema.
247            """
248            logger.warning(self.get_error_message(e))

Normalize and validate according to config.

Parameters

record: record instance for normalization/transformation. All modification are done by modifying existent object.
schema: object's jsonschema for normalization.

def get_error_message(self, e: jsonschema.exceptions.ValidationError) -> str: View Source

250    def get_error_message(self, e: ValidationError) -> str:
251        """
252        Construct a sanitized error message from a ValidationError instance.
253        """
254        field_path = ".".join(map(str, e.path))
255        type_structure = self._get_type_structure(e.instance)
256
257        return f"Failed to transform value from type '{type_structure}' to type '{e.validator_value}' at path: '{field_path}'"

Construct a sanitized error message from a ValidationError instance.

ENV_REQUEST_CACHE_PATH = 'REQUEST_CACHE_PATH'

@contextmanager

def create_timer( name: str) -> Generator[airbyte_cdk.utils.event_timing.EventTimer, Any, NoneType]: View Source

80@contextmanager
81def create_timer(name: str) -> Generator[EventTimer, Any, None]:
82    """
83    Creates a new EventTimer as a context manager to improve code readability.
84    """
85    a_timer = EventTimer(name)
86    yield a_timer

Creates a new EventTimer as a context manager to improve code readability.

class OneOfOptionConfig: View Source

 9class OneOfOptionConfig:
10    """
11    Base class to configure a Pydantic model that's used as a oneOf option in a parent model in a way that's compatible with all Airbyte consumers.
12
13    Inherit from this class in the nested Config class in a model and set title and description (these show up in the UI) and discriminator (this is making sure it's marked as required in the schema).
14
15    Usage:
16
17        ```python
18        class OptionModel(BaseModel):
19            mode: Literal["option_a"] = Field("option_a", const=True)
20            option_a_field: str = Field(...)
21
22            class Config(OneOfOptionConfig):
23                title = "Option A"
24                description = "Option A description"
25                discriminator = "mode"
26        ```
27    """
28
29    @staticmethod
30    def schema_extra(schema: Dict[str, Any], model: Any) -> None:
31        if hasattr(model.Config, "description"):
32            schema["description"] = model.Config.description
33        if hasattr(model.Config, "discriminator"):
34            schema.setdefault("required", []).append(model.Config.discriminator)

Base class to configure a Pydantic model that's used as a oneOf option in a parent model in a way that's compatible with all Airbyte consumers.

Inherit from this class in the nested Config class in a model and set title and description (these show up in the UI) and discriminator (this is making sure it's marked as required in the schema).

Usage:

class OptionModel(BaseModel):
    mode: Literal["option_a"] = Field("option_a", const=True)
    option_a_field: str = Field(...)

    class Config(OneOfOptionConfig):
        title = "Option A"
        description = "Option A description"
        discriminator = "mode"

@staticmethod

def schema_extra(schema: Dict[str, Any], model: Any) -> None: View Source

29    @staticmethod
30    def schema_extra(schema: Dict[str, Any], model: Any) -> None:
31        if hasattr(model.Config, "description"):
32            schema["description"] = model.Config.description
33        if hasattr(model.Config, "discriminator"):
34            schema.setdefault("required", []).append(model.Config.discriminator)

def resolve_refs(schema: dict[str, typing.Any]) -> dict[str, typing.Any]: View Source

13def resolve_refs(schema: dict[str, Any]) -> dict[str, Any]:
14    """
15    For spec schemas generated using Pydantic models, the resulting JSON schema can contain refs between object
16    relationships.
17    """
18    json_schema_ref_resolver = RefResolver.from_schema(schema)
19    str_schema = json.dumps(schema)
20    for ref_block in re.findall(r'{"\$ref": "#\/definitions\/.+?(?="})"}', str_schema):
21        ref = json.loads(ref_block)["$ref"]
22        str_schema = str_schema.replace(
23            ref_block, json.dumps(json_schema_ref_resolver.resolve(ref)[1])
24        )
25    pyschema: dict[str, Any] = json.loads(str_schema)
26    del pyschema["definitions"]
27    return pyschema

For spec schemas generated using Pydantic models, the resulting JSON schema can contain refs between object relationships.

def as_airbyte_message( stream: Union[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteStream, airbyte_protocol_dataclasses.models.airbyte_protocol.StreamDescriptor], current_status: airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteStreamStatus, reasons: Optional[List[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteStreamStatusReason]] = None) -> AirbyteMessage: View Source

23def as_airbyte_message(
24    stream: Union[AirbyteStream, StreamDescriptor],
25    current_status: AirbyteStreamStatus,
26    reasons: Optional[List[AirbyteStreamStatusReason]] = None,
27) -> AirbyteMessage:
28    """
29    Builds an AirbyteStreamStatusTraceMessage for the provided stream
30    """
31
32    now_millis = datetime.now().timestamp() * 1000.0
33
34    trace_message = AirbyteTraceMessage(
35        type=TraceType.STREAM_STATUS,
36        emitted_at=now_millis,
37        stream_status=AirbyteStreamStatusTraceMessage(
38            stream_descriptor=StreamDescriptor(name=stream.name, namespace=stream.namespace),
39            status=current_status,
40            reasons=reasons,
41        ),
42    )
43
44    return AirbyteMessage(type=MessageType.TRACE, trace=trace_message)

Builds an AirbyteStreamStatusTraceMessage for the provided stream

Config = typing.Mapping[str, typing.Any]

class Record(typing.Mapping[str, typing.Any]): View Source

22class Record(Mapping[str, Any]):
23    def __init__(
24        self,
25        data: Mapping[str, Any],
26        stream_name: str,
27        associated_slice: Optional[StreamSlice] = None,
28        file_reference: Optional[AirbyteRecordMessageFileReference] = None,
29    ):
30        self._data = data
31        self._associated_slice = associated_slice
32        self.stream_name = stream_name
33        self._file_reference = file_reference
34
35    @property
36    def data(self) -> Mapping[str, Any]:
37        return self._data
38
39    @property
40    def associated_slice(self) -> Optional[StreamSlice]:
41        return self._associated_slice
42
43    @property
44    def file_reference(self) -> AirbyteRecordMessageFileReference:
45        return self._file_reference
46
47    @file_reference.setter
48    def file_reference(self, value: AirbyteRecordMessageFileReference) -> None:
49        self._file_reference = value
50
51    def __repr__(self) -> str:
52        return repr(self._data)
53
54    def __getitem__(self, key: str) -> Any:
55        return self._data[key]
56
57    def __len__(self) -> int:
58        return len(self._data)
59
60    def __iter__(self) -> Any:
61        return iter(self._data)
62
63    def __contains__(self, item: object) -> bool:
64        return item in self._data
65
66    def __eq__(self, other: object) -> bool:
67        if isinstance(other, Record):
68            # noinspection PyProtectedMember
69            return self._data == other._data
70        return False
71
72    def __ne__(self, other: object) -> bool:
73        return not self.__eq__(other)

A Mapping is a generic container for associating key/value pairs.

This class provides concrete generic implementations of all methods except for __getitem__, __iter__, and __len__.

Record( data: Mapping[str, Any], stream_name: str, associated_slice: Optional[StreamSlice] = None, file_reference: Optional[airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageFileReference] = None) View Source

23    def __init__(
24        self,
25        data: Mapping[str, Any],
26        stream_name: str,
27        associated_slice: Optional[StreamSlice] = None,
28        file_reference: Optional[AirbyteRecordMessageFileReference] = None,
29    ):
30        self._data = data
31        self._associated_slice = associated_slice
32        self.stream_name = stream_name
33        self._file_reference = file_reference

stream_name

data: Mapping[str, Any] View Source

35    @property
36    def data(self) -> Mapping[str, Any]:
37        return self._data

associated_slice: Optional[StreamSlice] View Source

39    @property
40    def associated_slice(self) -> Optional[StreamSlice]:
41        return self._associated_slice

file_reference: airbyte_protocol_dataclasses.models.airbyte_protocol.AirbyteRecordMessageFileReference View Source

43    @property
44    def file_reference(self) -> AirbyteRecordMessageFileReference:
45        return self._file_reference

airbyte_cdk

Welcome to the Airbyte Python CDK!

Building Source Connectors

Building Destination Connectors

Working with Airbyte Protocol Models

Using the CLI (airbyte_cdk.cli)

Parameters

Returns

Parameters

Raises

Returns

Parameters

Raises

Returns

Parameters

Returns

Inherited Members

Parameters

Parameters

Parameters

Returns

Parameters

Inherited Members

Inherited Members

Inherited Members

Arguments:

Returns:

Inherited Members

Parameters

Parameters

Returns

Inherited Members

Parameters

Returns

Parameters

Returns

Inherited Members

Inherited Members

Parameters

Returns

Inherited Members

Parameters

Returns

Inherited Members

This transformation has access to the following contextual values:

Attributes:

Parameters

Returns

Attributes:

Inherited Members

Parameters

Attributes:

Inherited Members

Attributes:

Inherited Members

Attributes:

Returns

Attributes:

Parameters

Returns

Returns

Attributes:

Parameters

Parameters

Parameters

Returns

Returns

Parameters

Inherited Members

Attributes:

Inherited Members

Arguments:

Inherited Members

Attributes:

Returns

Returns

Parameters

Returns

Returns

Returns

Using the CLI (`airbyte_cdk.cli`)