airbyte.documents

This module contains the Documents class for converting Airbyte records into documents.

Generally you will not create Documents objects directly. Instead, you can use one of the following methods to generate documents from records:

  • Source.get_documents(): Get an iterable of documents from a source.
  • Dataset.to_documents(): Get an iterable of documents from a dataset.
 1# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
 2"""This module contains the `Documents` class for converting Airbyte records into documents.
 3
 4Generally you will not create `Documents` objects directly. Instead, you can use one of the
 5following methods to generate documents from records:
 6
 7- `Source.get_documents()`: Get an iterable of documents from a source.
 8- `Dataset.to_documents()`: Get an iterable of documents from a dataset.
 9"""
10
11from __future__ import annotations
12
13from typing import TYPE_CHECKING, Any
14
15from pydantic import BaseModel, Field
16
17
18if TYPE_CHECKING:
19    import datetime
20
21
22MAX_SINGLE_LINE_LENGTH = 60
23AIRBYTE_DOCUMENT_RENDERING = "airbyte_document_rendering"
24TITLE_PROPERTY = "title_property"
25CONTENT_PROPS = "content_properties"
26METADATA_PROPERTIES = "metadata_properties"
27
28
29class Document(BaseModel):
30    """A PyAirbyte document is a specific projection on top of a record.
31
32    Documents have the following structure:
33    - id (str): A unique string identifier for the document.
34    - content (str): A string representing the record when rendered as a document.
35    - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs
36      and/or URLs.
37
38    This class is duck-typed to be compatible with LangChain project's `Document` class.
39    """
40
41    id: str | None = Field(default=None)
42    content: str
43    metadata: dict[str, Any]
44    last_modified: datetime.datetime | None = Field(default=None)
45
46    def __str__(self) -> str:
47        """Return a string representation of the document."""
48        return self.content
49
50    @property
51    def page_content(self) -> str:
52        """Return the content of the document.
53
54        This is an alias for the `content` property, and is provided for duck-type compatibility
55        with the LangChain project's `Document` class.
56        """
57        return self.content
58
59
60__all__ = [
61    "Document",
62]
class Document(pydantic.main.BaseModel):
30class Document(BaseModel):
31    """A PyAirbyte document is a specific projection on top of a record.
32
33    Documents have the following structure:
34    - id (str): A unique string identifier for the document.
35    - content (str): A string representing the record when rendered as a document.
36    - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs
37      and/or URLs.
38
39    This class is duck-typed to be compatible with LangChain project's `Document` class.
40    """
41
42    id: str | None = Field(default=None)
43    content: str
44    metadata: dict[str, Any]
45    last_modified: datetime.datetime | None = Field(default=None)
46
47    def __str__(self) -> str:
48        """Return a string representation of the document."""
49        return self.content
50
51    @property
52    def page_content(self) -> str:
53        """Return the content of the document.
54
55        This is an alias for the `content` property, and is provided for duck-type compatibility
56        with the LangChain project's `Document` class.
57        """
58        return self.content

A PyAirbyte document is a specific projection on top of a record.

Documents have the following structure:

  • id (str): A unique string identifier for the document.
  • content (str): A string representing the record when rendered as a document.
  • metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs and/or URLs.

This class is duck-typed to be compatible with LangChain project's Document class.

id: str | None
content: str
metadata: dict[str, typing.Any]
last_modified: datetime.datetime | None
page_content: str
51    @property
52    def page_content(self) -> str:
53        """Return the content of the document.
54
55        This is an alias for the `content` property, and is provided for duck-type compatibility
56        with the LangChain project's `Document` class.
57        """
58        return self.content

Return the content of the document.

This is an alias for the content property, and is provided for duck-type compatibility with the LangChain project's Document class.

model_config: ClassVar[pydantic.config.ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, pydantic.fields.FieldInfo]] = {'id': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'content': FieldInfo(annotation=str, required=True), 'metadata': FieldInfo(annotation=dict[str, Any], required=True), 'last_modified': FieldInfo(annotation=ForwardRef('datetime.datetime | None'), required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

model_computed_fields: ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

Inherited Members
pydantic.main.BaseModel
BaseModel
model_extra
model_fields_set
model_construct
model_copy
model_dump
model_dump_json
model_json_schema
model_parametrized_name
model_post_init
model_rebuild
model_validate
model_validate_json
model_validate_strings
dict
json
parse_obj
parse_raw
parse_file
from_orm
construct
copy
schema
schema_json
validate
update_forward_refs