airbyte.documents
This module contains the Documents
class for converting Airbyte records into documents.
Generally you will not create Documents
objects directly. Instead, you can use one of the
following methods to generate documents from records:
Source.get_documents()
: Get an iterable of documents from a source.Dataset.to_documents()
: Get an iterable of documents from a dataset.
1# Copyright (c) 2024 Airbyte, Inc., all rights reserved. 2"""This module contains the `Documents` class for converting Airbyte records into documents. 3 4Generally you will not create `Documents` objects directly. Instead, you can use one of the 5following methods to generate documents from records: 6 7- `Source.get_documents()`: Get an iterable of documents from a source. 8- `Dataset.to_documents()`: Get an iterable of documents from a dataset. 9""" 10 11from __future__ import annotations 12 13from typing import TYPE_CHECKING, Any 14 15from pydantic import BaseModel, Field 16 17 18if TYPE_CHECKING: 19 import datetime 20 21 22MAX_SINGLE_LINE_LENGTH = 60 23AIRBYTE_DOCUMENT_RENDERING = "airbyte_document_rendering" 24TITLE_PROPERTY = "title_property" 25CONTENT_PROPS = "content_properties" 26METADATA_PROPERTIES = "metadata_properties" 27 28 29class Document(BaseModel): 30 """A PyAirbyte document is a specific projection on top of a record. 31 32 Documents have the following structure: 33 - id (str): A unique string identifier for the document. 34 - content (str): A string representing the record when rendered as a document. 35 - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs 36 and/or URLs. 37 38 This class is duck-typed to be compatible with LangChain project's `Document` class. 39 """ 40 41 id: str | None = Field(default=None) 42 content: str 43 metadata: dict[str, Any] 44 last_modified: datetime.datetime | None = Field(default=None) 45 46 def __str__(self) -> str: 47 """Return a string representation of the document.""" 48 return self.content 49 50 @property 51 def page_content(self) -> str: 52 """Return the content of the document. 53 54 This is an alias for the `content` property, and is provided for duck-type compatibility 55 with the LangChain project's `Document` class. 56 """ 57 return self.content 58 59 60__all__ = [ 61 "Document", 62]
class
Document(pydantic.main.BaseModel):
30class Document(BaseModel): 31 """A PyAirbyte document is a specific projection on top of a record. 32 33 Documents have the following structure: 34 - id (str): A unique string identifier for the document. 35 - content (str): A string representing the record when rendered as a document. 36 - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs 37 and/or URLs. 38 39 This class is duck-typed to be compatible with LangChain project's `Document` class. 40 """ 41 42 id: str | None = Field(default=None) 43 content: str 44 metadata: dict[str, Any] 45 last_modified: datetime.datetime | None = Field(default=None) 46 47 def __str__(self) -> str: 48 """Return a string representation of the document.""" 49 return self.content 50 51 @property 52 def page_content(self) -> str: 53 """Return the content of the document. 54 55 This is an alias for the `content` property, and is provided for duck-type compatibility 56 with the LangChain project's `Document` class. 57 """ 58 return self.content
A PyAirbyte document is a specific projection on top of a record.
Documents have the following structure:
- id (str): A unique string identifier for the document.
- content (str): A string representing the record when rendered as a document.
- metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs and/or URLs.
This class is duck-typed to be compatible with LangChain project's Document
class.
page_content: str
model_config: ClassVar[pydantic.config.ConfigDict] =
{}
Configuration for the model, should be a dictionary conforming to [ConfigDict
][pydantic.config.ConfigDict].
Inherited Members
- pydantic.main.BaseModel
- BaseModel
- model_extra
- model_fields_set
- model_construct
- model_copy
- model_dump
- model_dump_json
- model_json_schema
- model_parametrized_name
- model_post_init
- model_rebuild
- model_validate
- model_validate_json
- model_validate_strings
- dict
- json
- parse_obj
- parse_raw
- parse_file
- from_orm
- construct
- copy
- schema
- schema_json
- validate
- update_forward_refs
- model_fields
- model_computed_fields