New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat(ai): add document qa and document summary #2054

Open

tommythorsen wants to merge 11 commits into master from tommy/ai

Contributor

tommythorsen commented Dec 3, 2024 •

edited

Loading

Description

This commit adds sdk support for the newly GA endpoints Document Question Answering and Document Summary, which are part of the AI API.

Checklist:

Tests added/updated.
Documentation updated. Documentation is generated from docstrings - these must be updated according to your change.
If a new method has been added it should be referenced in cognite.rst in order to generate docs based on its docstring.
Changelog updated in CHANGELOG.md.
Version bumped. If triggering a new release is desired, bump the version number in _version.py and pyproject.toml per semantic versioning.

tommythorsen requested review from a team as code owners

December 3, 2024 19:29

codecov bot commented Dec 3, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.54%. Comparing base (fa7f991) to head (bf03552).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2054      +/-   ##
==========================================
+ Coverage   90.49%   90.54%   +0.04%     
==========================================
  Files         141      145       +4     
  Lines       22488    22603     +115     
==========================================
+ Hits        20351    20466     +115     
  Misses       2137     2137

Files with missing lines	Coverage Δ
cognite/client/_api/ai/__init__.py	`100.00% <100.00%> (ø)`
cognite/client/_api/ai/tools/__init__.py	`100.00% <100.00%> (ø)`
cognite/client/_api/ai/tools/documents.py	`100.00% <100.00%> (ø)`
cognite/client/_api_client.py	`90.18% <ø> (ø)`
cognite/client/_cognite_client.py	`93.81% <100.00%> (+0.13%)`	⬆️
cognite/client/_version.py	`100.00% <100.00%> (ø)`
cognite/client/data_classes/__init__.py	`100.00% <100.00%> (ø)`
cognite/client/data_classes/ai.py	`100.00% <100.00%> (ø)`
cognite/client/testing.py	`100.00% <100.00%> (ø)`

kandeh reviewed

View reviewed changes

cognite/client/_api/ai/tools/documents.py Outdated Show resolved Hide resolved

kandeh approved these changes

View reviewed changes

kandeh left a comment

Added a small comment; Otherwise, looks good to me.

andersfylling reviewed

View reviewed changes

cognite/client/_api/ai/tools/documents.py Outdated Show resolved Hide resolved

tommythorsen commented

View reviewed changes

cognite/client/_api/ai/tools/documents.py Outdated Show resolved Hide resolved

tommythorsen and others added 11 commits

December 4, 2024 18:02


          feat(ai): add document qa and document summary

9971c53

This commit adds sdk support for the newly GA endpoints Document
Question Answering and Document Summary, which are part of the AI API.


          add ai.rst

07dc219


          updated changelog and version

4b565dc


          add unit tests


          add to index.rst

52d47f8


          fix test

3d962b4


          another fix

661b1c1


          add retryable regex pattern

1378e7b


          Update cognite/client/_api/ai/tools/documents.py

bbc32b4


          InstanceId -> NodeId

a1f12e3


          add utility methods to Answer

bf03552

The utility methods can concatenate the pieces of the answer text to the
full text of the answer, and combine the reference lists to a single
list, eliminating duplicates.

tommythorsen force-pushed the tommy/ai branch from 10bd266 to bf03552 Compare

December 4, 2024 17:22

andersfylling reviewed

View reviewed changes

cognite/client/_api/ai/tools/documents.py

+                              >>> client.ai.tools.documents.summarize(ids=[123])
+                      """
+                      body = {
+                          "items": (

andersfylling Dec 4, 2024

you can do something like the following to get a list of ids:

identifiers = IdentifierSequence.load(ids=id, external_ids=external_id, instance_ids=instance_id)
identifiers_dict = identifiers.as_dicts()

andersfylling reviewed

View reviewed changes

cognite/client/_api/ai/tools/documents.py

+                              >>> client.ai.tools.documents.ask(question="What model pump was used?", ids=[123])
+                      """
+                      body = {
+                          "question": question,

andersfylling Dec 4, 2024

same identifier trick here

haakonvt requested changes

View reviewed changes

Contributor

haakonvt left a comment

Great additions! I look forward to testing out 🥳

cognite/client/_api/ai/tools/documents.py

Comment on lines +16 to +18

+                      ids: Sequence[int] | None = None,
+                      external_ids: SequenceNotStr[str] | None = None,
+                      instance_ids: Sequence[NodeId] | None = None,

Contributor

haakonvt Dec 4, 2024

If you use "Identifier trick" as mentioned above, supporting single inputs will automatically be supported. I suggest using parameter names in singular case:

Suggested change

      
                    ids: Sequence[int] | None = None,
          
                    external_ids: SequenceNotStr[str] | None = None,
          
                    instance_ids: Sequence[NodeId] | None = None,
          
                    id: int | Sequence[int] | None = None,
          
                    external_id: str | Sequence[str] | None = None,
          
                    instance_id: NodeId | Sequence[NodeId] | None = None,

cognite/client/_api/ai/tools/documents.py

+                      instance_ids: Sequence[NodeId] | None = None,
+                      ignore_unknown_ids: bool = False,
+                  ) -> list[Summary]:
+                      """Summarize a document using a Large Language Model.

Contributor

haakonvt Dec 4, 2024

This is missing API doc link

cognite/client/_api/ai/tools/documents.py

+                          ids (Sequence[int] | None): Internal ids of documents to summarize.
+                          external_ids (SequenceNotStr[str] | None): External ids of documents to summarize.
+                          instance_ids (Sequence[NodeId] | None): Instance ids of documents to summarize.
+                          ignore_unknown_ids (bool): Whether to skip documents that can't be summarized, without throwing an error

Contributor

haakonvt Dec 4, 2024

Is this right? ignore_unknown_ids is typically used when the given ids cant be found

cognite/client/_api/ai/tools/documents.py

+                          ignore_unknown_ids (bool): Whether to skip documents that can't be summarized, without throwing an error
+                      Returns:
+                          list[Summary]: No description.

Contributor

haakonvt Dec 4, 2024

Missing description of return type

cognite/client/_api/ai/tools/documents.py

+                      external_ids: SequenceNotStr[str] | None = None,
+                      instance_ids: Sequence[NodeId] | None = None,
+                      ignore_unknown_ids: bool = False,
+                  ) -> list[Summary]:

Contributor

haakonvt Dec 4, 2024

Let's make a SummaryList

andersfylling Dec 7, 2024

Could you shed some light on why that is useful? I come from Go and have learned the hard way to prefer primitive types, but I see it's very common in the sdk

Contributor

haakonvt Dec 8, 2024

Generally I agree with you, but users of the SDK expect certain features from our "list of resource" classes such as:

get method for hash table backed item lookup
pretty-printing in notebooks
easy conversion to pandas data frame
load and dump methods to/from dict and yaml
...all of which come for free by inheriting from the base Cognite list class

cognite/client/data_classes/ai.py



		@dataclass
		class AnswerReference:

Contributor

haakonvt Dec 4, 2024

See comments above.

cognite/client/data_classes/ai.py


		content: list[AnswerContent]

		def get_full_answer_text(self) -> str:

Contributor

haakonvt Dec 4, 2024

Property perhaps?

Suggested change

      
                def get_full_answer_text(self) -> str:
          
                @property
          
                def full_answer(self) -> str:

Contributor

haakonvt Dec 4, 2024

Also, should __str__ return the full answer perhaps?

cognite/client/data_classes/ai.py

+                      Get the full answer text. This is the concatenation of the texts from
+                      all the content objects.
+                      """
+                      return "".join([content.text for content in self.content])

Contributor

haakonvt Dec 4, 2024

Don't need to create a temporary list:

Suggested change

      
                    return "".join([content.text for content in self.content])
          
                    return "".join(content.text for content in self.content)

cognite/client/data_classes/ai.py

+                      """
+                      return "".join([content.text for content in self.content])
+                  def get_all_references(self) -> list[AnswerReference]:

Contributor

haakonvt Dec 4, 2024

Consider using a property (or cached_property)

Suggested change

      
                def get_all_references(self) -> list[AnswerReference]:
          
                @property
          
                def all_references(self) -> list[AnswerReference]:

cognite/client/testing.py

@@ @@ -95,6 +98,9 @@ def __init__(self, *args: Any, **kwargs: Any) -> None: @@
                       #   - Add spacing above and below
                       #   - Use `spec=MyAPI` only for "top level"
                       #   - Use `spec_set=MyNestedAPI` for all nested APIs
+                      self.ai = MagicMock(spec=AIAPI)
+                      self.ai.tools = MagicMock(spec=AIToolsAPI)
+                      self.ai.tools.documents = MagicMock(spec_set=AIDocumentsAPI)

Contributor

haakonvt Dec 4, 2024

Suggested change

      
                    self.ai.tools.documents = MagicMock(spec_set=AIDocumentsAPI)
          
                    self.ai.tools.documents = MagicMock(spec_set=AIDocumentsAPI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet