You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SpaCy 3.0's language models now produce some additional features that we don't currently translate to DataFrames. The parse tree information now includes information on children and ancestors. There is an is_sent_start flag to indicate whether a token is at the beginning of a sentence. There is support for embeddings in the vector field of Token. There are probably a few more. See https://spacy.io/api/token for the full list.
With these additional features, the DataFrame representation of the full output of a SpaCy language model is getting a bit large, so it would be a good idea to also add a facility to produce only the DataFrame columns that your application needs -- say, an additional argument to make_tokens_and_features that replaces and generalizes the existing add_left_and_right argument to control whether multiple columns appear in the output.
The text was updated successfully, but these errors were encountered:
SpaCy 3.0's language models now produce some additional features that we don't currently translate to DataFrames. The parse tree information now includes information on children and ancestors. There is an
is_sent_start
flag to indicate whether a token is at the beginning of a sentence. There is support for embeddings in thevector
field ofToken
. There are probably a few more. See https://spacy.io/api/token for the full list.We should extend the existing SpaCy support in https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/spacy.py to support these additional features if present.
With these additional features, the DataFrame representation of the full output of a SpaCy language model is getting a bit large, so it would be a good idea to also add a facility to produce only the DataFrame columns that your application needs -- say, an additional argument to
make_tokens_and_features
that replaces and generalizes the existingadd_left_and_right
argument to control whether multiple columns appear in the output.The text was updated successfully, but these errors were encountered: