The jac_nlp
package contains a collection of state-of-the-art NLP models that can be used to perform various nlp tasks such as named entity recongnition, text summerization, embedding generation, topic extraction etc. following is a list of all the models available in the jac_nlp
package.
- Jaseci NLP Package
jac_nlp
use_enc
module uses the universal sentence encoder to generate sentence level embeddings.
The sentence level embeddings can then be used to calculate the similarity between two given text via cosine similarity and/or dot product.
encode
: encodes the text and returns a embedding of 512 length- Alternate name:
get_embedding
- Input:
text
(string or list of strings): text to be encoded
- Return: Encoded embeddings
- Alternate name:
cos_sim_score
:- Input:
q_emb
(string or list of strings): first text to be embededa_emb
(string or list of strings): second text to be embedded
- Return: cosine similarity score
- Input:
text_simliarity
: calculate the simlarity score between given texts- Input:
text1
(string): first texttext2
(string): second text
- Return: cosine similarity score
- Input:
text_classify
: use USE encoder as a classifier- Input:
text
(string): text to classifyclasses
(list of strings): candidate classification classes
- Input:
# Use USE encoder for zero-shot intent classification
walker use_enc_example {
can use.encode, use.cos_sim_score;
has text = "What is the weather tomorrow?";
has candidates = [
"weather forecast",
"ask for direction",
"order food"
];
text_emb = use.encode(text)[0];
cand_embs = use.encode(candidates); # use.encode handles string/list
max_score = 0;
max_cand = 0;
cand_idx = 0;
for cand_emb in cand_embs {
cos_score = use.cos_sim_score(cand_emb, text_emb);
if (cos_score > max_score) {
max_score = cos_score;
max_cand = cand_idx;
}
cand_idx += 1;
}
predicted_cand = candidates[max_cand];
}
For a complete example visit here
use_qa
module uses the multilingual-qa to generate sentence level embeddings.
The sentence level embeddings can then be used to calculate best match between question and available answers via cosine similarity and/or dist_score.
-
question_encode
: encodes question and returns a embedding of 512 length- Alternate name:
enc_question
- Input:
text
(string or list of strings): question to be encoded
- Return: Encoded embeddings
- Alternate name:
-
answer_encode
: encodes answers and returns a embedding of 512 length- Alternate name:
enc_answer
- Input:
text
(string or list of strings): question to be encodedcontext
(string or list of strings): usually the text around the answer text, for example it could be 2 sentences before plus 2 sentences after.
- Return: Encoded embeddings
- Alternate name:
-
cos_sim_score
:- Input:
q_emb
(string or list of strings): first embeded texta_emb
(string or list of strings): second embeded text
- Return: cosine similarity score
- Input:
-
dist_score
:- Input:
q_emb
(string or list of strings): first embeded texta_emb
(string or list of strings): second embeded text
- Return: inner product score
- Input:
-
question_similarity
: calculate the simlarity score between given questions- Input:
text1
(string): first texttext2
(string): second text
- Return: cosine similarity score
- Input:
-
question_classify
: use USE QA as question classifier- Input:
text
(string): text to classifyclasses
(list of strings): candidate classification classes
- Input:
-
answer_similarity
: calculate the simlarity score between given answer- Input:
text1
(string): first texttext2
(string): second text
- Return: cosine similarity score
- Input:
-
answer_classify
: use USE encoder as answer classifier- Input:
text
(string): text to classifyclasses
(list of strings): candidate classification classes
- Input:
-
qa_similarity
: calculate the simlarity score between question and answer- Input:
text1
(string): first texttext2
(string): second text
- Return: cosine similarity score
- Input:
-
qa_classify
: use USE encoder as a QA classifier- Input:
text
(string): text to classifyclasses
(list of strings): candidate classification classes
- Returns:
- Input:
# Use USE_QA model for zero-shot text classification
walker use_qa_example {
can use.qa_similarity;
has questions = "What is your age?";
has responses = ["I am 20 years old.", "good morning"];
has response_contexts = ["I will be 21 next year.", "great day."];
max_score = 0;
max_cand = 0;
cand_idx = 0;
for response in responses {
cos_score = use.qa_similarity(text1=questions,text2=response);
std.out(cos_score);
if (cos_score > max_score) {
max_score = cos_score;
max_cand = cand_idx;
}
cand_idx += 1;
}
predicted_cand = responses[max_cand];
}
For a complete example visit here
fast_enc
module uses the facebook's fasttext -- efficient learning of word representations and sentence classification.
train
: used to train the Bi-Encoder for custom input- Input:
traindata
(Dict): dictionary of candidates and suportting contexts for each candidatetrain_with_existing
(bool): if set to true train the model from scratch otherwise trains incrementally
- Input:
predict
: predits the most suitable candidate for a provided context, takes text or embedding- Input:
sentences
(list of strings): list of sentences the needs to be classified
- Return: a dictionary of sentence, predicted intent and probability
- Input:
save_model
:- Input
model_path
(string): the path to save model
- Returns: "[Saved model at] : <model_path>" if model successfully saved
- Input
load_model
:- Input
model_path
(string): the path to save model
- Returns: "[loaded model from] : <model_path>" if model successfully loaded
- Input
# Train and inference with a fasttext classifier
walker fast_enc_example {
has train_file = "fast_enc_train.json";
has train_with_existing = false;
has test_sentence= ["what's going on ?"];
can fast_enc.train,fast_enc.predict;
# Training the model
train_data = file.load_json(train_file);
fast_enc.train(traindata=train_data,train_with_existing=false);
# Getting inference from the model
resp_data=fast_enc.predict(sentences=test_sentence);
std.out(resp_data);
}
For a complete example visit here
bi_enc
module can be used for intent classification, it takes contexts and candidates, to predict the best suitable candidate for each context. You can train the module on custom data to behave accordingly.
dot_prod
:- Input:
vec_a
(list of float): first embeded textvec_b
(list of float): second embeded text
- Return: dot product score
- Input:
cos_sim_score
:- Input:
vec_a
(list of float): first embeded textvec_b
(list of float): second embeded text
- Return: cosine similarity score
- Input:
infer
: predits the most suitable candidate for a provided context, takes text or embedding- Input:
contexts
(string or list of strings): context which needs to be classifiedcandidates
(string or list of strings): list of candidates for the contextcontext_type
(string): can be text or embedding typecandidate_type
(string): can be text or embedding type
- Return: a dictionary of similarity score for each candidate and context
- Input:
train
: used to train the Bi-Encoder for custom input- Input:
dataset
(Dict): dictionary of candidates and suportting contexts for each candidatefrom_scratch
(bool): if set to true train the model from scratch otherwise trains incrementallytraining_parameters
(Dict): dictionary of training parameters
- Returns: text when model training is completed
- Input:
get_context_emb
:- Alternate name:
encode_context
- Input:
contexts
(string or list of strings): context which needs to be encoded
- Returns a list of embedding of 128 length for tiny bert
- Alternate name:
get_candidate_emb
:- Alternate name:
encode_candidate
- Input:
candidates
(string or list of strings): candidates which needs to be encoded
- Returns: list of embedding of 128 length for tiny bert
- Alternate name:
get_train_config
:- Input: None
- Returns: json of all the current training configuration
{ "max_contexts_length": 128, "max_candidate_length": 64, "train_batch_size": 8, "eval_batch_size": 2, "max_history": 4, "learning_rate": 0.001, "weight_decay": 0.01, "warmup_steps": 100, "adam_epsilon": 1e-06, "max_grad_norm": 1, "num_train_epochs": 10, "gradient_accumulation_steps": 1, "fp16": false, "fp16_opt_level": "O1", "gpu": 0, "basepath": "logoutput", "seed": 12345, "device": "cuda" }
set_train_config
:- Input
train_parameters
(Dict): dictionary of training parameters. See the json example above underget_train_config
for the list of available training parameters.
- Returns: "Config setup is complete." if train configuration is completed successfully
- Input
get_model_config
:- Input: None
- Returns: json of all the current model configuration
{ "shared": false, "model_name": "prajjwal1/bert-tiny", "model_save_path": "modeloutput", "loss_function": "mse", "loss_type": "dot" }
set_model_config
:- Input
model_parameters
(Dict): dictionary of model parameters. See the json example above underget_model_config
for the list of available training parameters.
- Returns: "Config setup is complete." if model configuration is completed successfully
- Input
save_model
:- Input
model_path
(string): the path to save model
- Returns: "[Saved model at] : <model_path>" if model successfully saved
- Input
load_model
:- Input
model_path
(string): the path to save model
- Returns: "[loaded model from] : <model_path>" if model successfully loaded
- Input
# Train an bi-encoder model for intent classification
walker bi_enc_example{
has train_file = "train_bi.json";
has from_scratch = true;
has num_train_epochs = 20;
has contexts= ["Share my location with Hillary's sister"];
has candidates=[
"searchplace",
"getplacedetails",
"bookrestaurant",
"gettrafficinformation",
"compareplaces",
"sharecurrentlocation",
"requestride",
"getdirections",
"shareeta",
"getweather"
];
can bi_enc.train,bi_enc.infer;
train_data = file.load_json(train_file);
# Train the model
bi_enc.train(
dataset=train_data,
from_scratch=from_scratch,
training_parameters={
"num_train_epochs": num_train_epochs
}
);
# Use the model to perform inference
# returns the list of context with the suitable candidates
resp_data = bi_enc.infer(
contexts=contexts,
candidates=candidates,
context_type="text",
candidate_type="text"
);
# Iterate through the candidate labels and their predicted scores
max_score = 0;
max_intent = "";
pred=resp_data[0];
for j=0 to j<pred["candidate"].length by j+=1 {
if (pred["score"][j] > max_score){
max_intent = pred["candidate"][j];
max_score = pred["score"][j];
}
}
std.out("predicted intent : ",max_intent ," Conf_Score:", max_score);
}
For a complete example visit here
sbert_sim
is a implementation of SentenceBert for scoring similarity between sentence pairs, it uses bi-encoder in a saimese setup to encode the sentences followed by the cosine similarity to score the similarity.
get_dot_score
: Caculate the dot product of two given vectors- Input:
vec_a
(list of float): first embeded textvec_b
(list of float): second embeded text
- Return: dot product score
- Input:
get_cos_score
: Caculate the cosine similarity score of two given vectors- Input:
vec_a
(list of float): first embeded textvec_b
(list of float): second embeded text
- Return: cosine similarity score
- Input:
get_text_sim
: gets the similarity score betweenquery
with all the sentences incorpus
and return the top_k similar sentences withsim_score
- Input:
query
(string or list of strings): context which needs to be classifiedcorpus
(string or list of strings): list of candidates for the contexttop_k
(string): can be text or embedding type
- Return: list of top_k similar sentences with
sim_score
- Input:
train
: used to train the Bi-Encoder for custom input- Input:
dataset
(List): List of List, each list contains a pair of sentence and similarity score.training_parameters
(Dict): dictionary of training parameters
- Returns: text when model training is completed
- Input:
getembeddings
:- Input:
texts
(string or list of strings): take text and returns a encoded embeddings
- Returns a list of embeddings
- Input:
get_train_config
:- Input: None
- Returns: json of all the current training configuration
{ "device": "cpu", "num_epochs": 2, "model_save_path": "output/sent_model-2022-11-04_17-43-18", "model_name": "bert-base-uncased", "max_seq_length": 256 }
load_model
:-
Input
model_type
(string): can bedefault
ortfm_model
default
: loads model from the sbert model zootfm_model
: load tranformer model from the huggingface hub
model_name
(string): this is name of the model to be loaded-
{ "model_name": "all-MiniLM-L12-v2", "model_type": "default" }
-
Returns: "[loaded model from] : <model_type> <model_name>" if model successfully loaded
-
[loaded model from] SBERT Hub : all-MiniLM-L12-v2
-
-
## Train and evalute a sbert model for senetence similarity
walker sbert_sim_example{
has train_file = "train_sbert.json";
has num_epochs = 2;
has query= ["A girl is on a sandy beach."];
has corpus=["A girl dancing on a sandy beach."];
has top_k=1;
can sbert_sim.train,sbert_sim.get_text_sim;
train_data = file.load_json(train_file);
# Train the model
sbert_sim.train(
dataset=train_data['train_data'],
training_parameters={
"num_epochs": num_epochs
}
);
# returns the top_k of simlar test in the corpus
resp_data = sbert_sim.get_text_sim(query=query,corpus=corpus,top_k=top_k);
std.out(resp_data);
}
For a complete example visit here
ent_ext
module uses Flair named entity recognition architecture. Can either be used zero-shot or trained.
train
: used to train the Flair-based NER model- Input:
train_data
: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context.
[ { "context": "EU rejects German call to boycott British lamb", "entities": [ { "entity_value": "EU", "entity_type": "ORG", "start_index": 0, "end_index": 2 }, { "entity_value": "German", "entity_type": "MISC", "start_index": 11, "end_index": 17 }, { "entity_value": "British", "entity_type": "MISC", "start_index": 34, "end_index": 41 } ] } ]
val_data
: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context
[ { "context": "CRICKET LEICESTERSHIRE TAKE OVER AT TOP AFTER INNINGS VICTORY", "entities": [ { "entity_value": "LEICESTERSHIRE", "entity_type": "ORG", "start_index": 8, "end_index": 22 } ] } ]
test_data
: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context
[ { "context": "The former Soviet republic was playing in an Asian Cup finals tie for the first time", "entities": [ { "entity_value": "Soviet", "entity_type": "MISC", "start_index": 11, "end_index": 17 }, { "entity_value": "Asian", "entity_type": "MISC", "start_index": 45, "end_index": 50 }, { "entity_value": "Asian", "entity_type": "MISC", "start_index": 45, "end_index": 50 } ] } ]
train_params
: (Dict): dictionary of training parameters to modify the training behaviour
{ "num_epoch": 20, "batch_size": 16, "LR": 0.01 }
- Input:
entity_detection
: detects all availabe entities from the provided context- Input:
text
(string): context to detect entities.ner_labels
(list of strings): List of entities, e.g. ["LOC","PER"]
- Return: a list of dictionary entities containing entity_text, entity_value, conf_score and index
- Input:
save_model
:- Input
model_path
(string): the path to save model
- Returns: "[Saved model at] : <model_path>" if model successfully saved
- Input
load_model
:- Input
model_path
(string): the path to save model
- Returns: "[loaded model from] : <model_path>" if model successfully loaded
- Input
set_config
:- Input
ner_model
: pretrained or basic model to be loaded, provide the exact name of the model, available options are:Pre-trained LSTM / GRU
: ["ner", "ner-fast","ner-large"]Huggingface model
: all available models that can be intialized with AutoModelNone
: for load a RNN model from scratch
model_type
: type of model to be loaded, available options are :TRFMODEL
: for huggingface modelsLSTM
orGRU
: RNN models
- Returns: "Config setup is complete." if model successfully loaded
- Input
# Train and inference with an entity extraction model
walker ent_ext_example {
has train_file = "train_data.json";
has val_file = "val_data.json";
has test_file = "test_data.json";
has from_scratch = true;
has num_train_epochs = 20;
has batch_size = 8;
has learning_rate = 0.02;
can ent_ext.entity_detection, ent_ext.train;
train_data = file.load_json(train_file);
val_data = file.load_json(val_file);
test_data = file.load_json(test_file);
# Training the model
ent_ext.train(
train_data = train_data,
val_data = val_data,
test_data = test_data,
train_params = {
"num_epoch": num_train_epochs,
"batch_size": batch_size,
"LR": learning_rate
});
# Getting inference from the model
resp_data = ent_ext.entity_detection(text="book a flight from kolkata to delhi",ner_labels= ["LOC"]);
std.out(resp_data);
}
For a complete example visit here
tfm_ner
module uses transformers to identify and extract entities. It uses TokenClassification method from Huggingface.
train
: used to train transformer NER model- Input:
train_data
: (List(Dict)): a list dictionary containing contexts and list of entities in each context
[ { "context": "MINNETONKA , Minn .", "entities": [ { "entity_value": "MINNETONKA", "entity_type": "LOC", "start_index": 0, "end_index": 10 }, { "entity_value": "Minn", "entity_type": "LOC", "start_index": 13, "end_index": 17 } ] } ]
mode
: (String): mode for training the model, available options are :default
: train the model from scratchincremental
: providing more training data for current set of entitiesappend
: changing the number of entities (model is restarted and trained with all of traindata)
epochs
: (int): Number of epoch you want model to train.
- Input:
extract_entity
: detects all availabe entities from the provided context- Input:
text
(string): context to detect entities.
- Return: a list of dictionary entities containing entity_text, entity_value, conf_score and index
- Input:
save_model
:- Input
model_path
(string): the path to save model
- Returns: "[Saved model at] : <model_path>" if model successfully saved
- Input
load_model
:- Input
model_path
(string): the path to save model
- Returns: "[loaded model from] : <model_path>" if model successfully loaded
- Input
get_train_config
:- Input: None
- Returns: json of all the current training configuration
{ "MAX_LEN": 128, "TRAIN_BATCH_SIZE": 4, "VALID_BATCH_SIZE": 2, "EPOCHS": 50, "LEARNING_RATE": 2e-05, "MAX_GRAD_NORM": 10, "MODE": "default" }
set_train_config
:- Input
train_parameters
(Dict): dictionary of training parameters. See the json example above for available configuration parameters.
- Returns: "Config setup is complete." if train configuration is completed successfully
- Input
get_model_config
:- Input: None
- Returns: json of all the current model configuration
{ "model_name": "prajjwal1/bert-tiny", "model_save_path": "modeloutput" }
set_model_config
:- Input
model_parameters
(Dict): dictionary of model parameters. See the json example above for available configuration parameters.
- Returns: "Config setup is complete." if model configuration is completed successfully
- Input
# Train and inference with a transformer-based NER model
walker tfm_ner_example {
has train_file = "train_ner.json";
has num_train_epochs = 10;
has mode = "default";
can tfm_ner.extract_entity, tfm_ner.train;
train_data = file.load_json(train_file);
# Training the model
tfm_ner.train(
mode = mode,
epochs = num_train_epochs,
train_data=train_data
);
# Infer using the model
resp_data = tfm_ner.extract_entity(
text="book a flight from kolkata to delhi,Can you explain to me,please,what Homeowners Warranty Program means,what it applies to,what is its purpose? Thank you. The Humboldt University of Berlin is situated in Berlin, Germany"
);
std.out(resp_data);
}
For a complete example visit here
text_seg
Text segmentation is a method of splitting a document into smaller parts, which is usually called segments. It is widely used in text processing. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase etc. module implemented for the Topical Change Detection in Documents via Embeddings of Long Sequences.
-
get_segements
: gets different topics in the context provided, given a threshold- Input
text
(String): text the contain the entire contextthreshold
(Float): range is between 0-1, make each sentence as segment if, threshold is 1.
- Returns: List of Sentences that best summarizes the context
- Input
-
load_model
: to load the available model for text segmentation- Input
model_name
(String): name of the transformer model to load, options are:wiki
: trained on wikipedia datalegal
: trained on legal documents
- Returns: "[Model Loaded] : <model_name>"
- Input
-
Input data file
text_seg.json
{ "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail. But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there." }
walker text_seg_example {
has data_file = "text_seg.json";
has threshold = 0.85;
can text_seg.get_segments, text_seg.load_model;
# loading the desired model
resp_data = text_seg.load_model(model_name='wiki');
std.out(resp_data);
# Getting Segments of different topic from text
data = file.load_json(data_file);
resp_data = text_seg.get_segments(
text=data.text,
threshold=threshold
);
std.out(resp_data);
}
For a complete example visit here
cl_summer
uses the sumy summarizer to create extractive summary.
summarize
: to get the extractive summary in provided sentences count.-
Input
text
(String): text the contain the entire contexturl
(String): the link to the webpagesent_count
(int): number of sentence you want in the summarysummarizer_type
(String): name of the summarizer type, available options are:LsaSummarizer
LexRankSummarizer
LuhnSummarizer
-
Returns: List of Sentences that best summarizes the context
-
Input text file
summarize.json
{ "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail.', But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as, he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there." }
-
# Use the summarizer to summarize a given text blob
walker cl_summer_example {
has text_file = "summarize.json";
has sent_count = 5;
has summarizer_type = "LsaSummarizer";
can cl_summer.summarize;
# Getting Extractive summary from text
train_data = file.load_json(text_file);
resp_data = cl_summer.summarize(
text=train_data.text,
url="none",
sent_count=sent_count,
summarizer_type=summarizer_type
);
report resp_data;
}
# Use the summarizer to summarize a given URL
walker cl_summer_example {
has sent_count = 5;
has summarizer_type = "LsaSummarizer";
has url="https://in.mashable.com/";
can cl_summer.summarize;
# Getting Extractive summary from URL
resp_data_url = cl_summer.summarize(
text="none",
url=url,
sent_count=sent_count,
summarizer_type=summarizer_type
);
report resp_data_url;
}
For a complete example visit here
t5_sum
uses the T5 transformer model to perform abstractive summary on a body of text.
classify_text
: use the T5 model to summarize a body of text- Input:
text
(string): text to summarizemin_length
(integer): the least amount of words you want returned from the modelmax_length
(integer): the most amount of words you want returned from the model
- Input datafile
**data.json**
{ "text": "The US has passed the peak on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month. The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world. At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors. We'll be the comeback kids, all of us, he said. We want to get our country back. The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.", "min_length": 30, "max_length": 100 }
- Input:
# Use the T5 model to summarize a given piece of text
walker summarization {
can t5_sum.classify_text;
has data = "data.json";
data = file.load_json(data);
summarized_text = t5_sum.classify_text(
text = data["text"],
min_length = data["min_length"],
max_length = data["max_length"]
);
report summarized_text;
}
For a complete example visit here
bart_sum
uses the BART transformer model to perform abstractive summary on a body of text.
There are 2 ways to use bart_sum
module.
- Given a text, it will return the summary of the text.
- Given a web page url, it will return the summary of the web page.
Both the methods uses a single action summarize
to get the summary. Following are the parameters of the function.
text
- Text to be summarized. Type:Union[List[str], str]
(Optional)url
- Url of the web page to be summarized. Type:str
(Optional)max_length
- Maximum character length of the summary. Type:int
Default:100
min_length
- Minimum character length of the summary. Type:int
Default:10
Return type of the action is List[str]
.
Following example will return the summary of the a single text.
walker test_summarize_single {
can bart_sum.summarize;
report bart_sum.summarize("There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude.", 10);
}
You can also pass a list of texts to get the summary of all the texts.
walker test_summarize_batch {
can bart_sum.summarize;
report bart_sum.summarize(
["There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude.",
"There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude.",
"There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude."],
10
);
}
Following example will return the summary of the web page.
walker test_summarize_url {
can bart_sum.summarize;
report bart_sum.summarize(null, "https://in.mashable.com/");
}
For a complete example visit here
Topic Extraction (topic_ext
)
Module topic_ext
implemented for producing most relevant and possible set of topics for given set of text documents. Following is an example usage of the topic_ext
module.
-
topic_ext.topic_extraction
: This action extracts top n number of topics from each cluster. The the text along with cluster label for the text cluster should be provided here as an input.- Input
texts
- (list of strings) list of input text documents.labels
- (list of int) list of labels associated with each text documents.n_topics
- (int) number of topics to extract from each cluster.
- Returns
- A dictionary which contains relevant topics for each clusters.
- Input
-
Input data file
topic_extraction.json
[ "still waiting card", "countries supporting", "card still arrived weeks", "countries accounts suppor", "provide support countries", "waiting week card still coming", "track card process delivery", "countries getting support", "know get card lost", "send new card", "still received new card", "info card delivery", "new card still come", "way track delivery card", "countries currently support"]
walker init{
can file.load_json;
has text = file.load_json("topic_extraction.json");
can use.encode;
has encode = use.encode(visitor.text);
can cluster.get_umap;
final_features = cluster.get_umap(encode,2);
can cluster.get_cluster_labels;
labels = cluster.get_cluster_labels(final_features,"hbdscan",2,2);
can topic_ext.topic_extraction;
topic_dict = topic_ext.topic_extraction(texts=text,classes=labels,n_topics=5);
}
For a complete example visit here
Module sentiment
implemented for analysing the sentiment in a given list of text. This module accepts as input a set of sentences.
texts
- (list of strings) list of input text documents.
walker test_predict{
can sentiment.predict;
has texts = ["I love you", "I hate you"];
report sentiment.predict(texts);
}
For a complete example visit here
Module paraphraser
implemented for paraphrasing the given input text.
text
- (Strings) Input text phrases.
walker init{
can paraphraser.paraphrase;
has text = "Yiping Kang is inviting you to a scheduled Zoom meeting";
report paraphraser.paraphrase(text=text);
}
For a complete example visit here
Module gp2
uses the OpenAI's GPT-2-medium
to perform text genreation on a given text.
The generate
action allows you to generate text based on the input text you provide.
Inputs:
- text: input text, either a string or a list of strings
- max_length: maximum length of the generated text (default: 30)
- min_length: minimum length of the generated text (default: 10)
- num_return_sequences: number of sequences to return (default: 3)
Output: a list of generated text sequences
The gtp2.get_embeddings
action allows you to get the embeddings for the input text.
Inputs:
- text: input text, either a string or a list of strings
Output: a list of embeddings for the input text
Given a text or a list of texts, it will return the generated text.
walker test_generate {
can gpt2.generate;
report gpt2.generate(text= "Hello, my name is", num_return_sequences= 5);
}
Given a text or a list of texts, it will return the embeddings of the text.
walker test_get_embeddings {
can gpt2.get_embeddings;
report gpt2.get_embeddings(text= ["Hello, my name is GPT2", "GPT2 is an Text-to-Text Generation Model" ]);
}
For a complete example visit here