What is PyAirbyte

PyAirbyte is a lightweight Python interface that lets you run any Airbyte connector directly inside your code. Instead of deploying the full Airbyte platform, you import a library, point it at a source, and start streaming JSON records into the storage option of your choice. The same connectors, schemas, and incremental‑state logic you would use in Airbyte Cloud are available locally, so you can prototype quickly and promote the exact pipeline to production without rewriting.

In short, PyAirbyte brings the Airbyte protocol to wherever Python runs—Jupyter notebooks, scheduled scripts, serverless functions, or long‑lived services—while staying fully compatible with the wider Airbyte ecosystem.

How it works

graph TD
    A[Airbyte Connector] -->|ab.get_source| B(PyAirbyte)
    B --> C[Source Connector]
    C2 --> D[Cache / Destination]
    C -->|discover| C1[schema]
    C -->|read| C2[records]
    D -->|persist| E[(DuckDB / MotherDuck / Vector / Snowflake / others)]

Limitations

Feature PyAirbyte
Orchestration / scheduling ❌ – bring your own (GitHub Actions, Airflow, Dagster…)
Monitoring & alerting ❌ – You can configure it yourself
Transformations ✅ lightweight, Python‑side