-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: word count #118
fix: word count #118
Conversation
# Conflicts: # tests/__mocks__/results/content-evaluator-results.json # tests/__mocks__/results/formatting-evaluator-results.json # tests/__mocks__/results/github-comment-results.json # tests/__mocks__/results/output-reward-split.html # tests/__mocks__/results/output.html # tests/__mocks__/results/permit-generation-results.json # tests/__mocks__/results/reward-split.json
@@ -10,6 +11,7 @@ import { userExtractorConfigurationType } from "./user-extractor-config"; | |||
|
|||
export const incentivesConfigurationSchema = T.Object( | |||
{ | |||
logLevel: T.Enum(LOG_LEVEL, { default: LOG_LEVEL.INFO }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the default be errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is nice to get info
within the logs by default because some interesting info is there during the process.
@@ -26,6 +26,8 @@ export class DataPurgeModule implements Module { | |||
.replace(/^>.*$/gm, "") | |||
// Remove commands such as /start | |||
.replace(/^\/.+/g, "") | |||
// Remove HTML comments | |||
.replace(/<!--[\s\S]*?-->/g, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a virtual DOM creator like jsdom
or mdast
, we should be able to query element.textContent and it should handle this and other situations in a robust manner. Since this is already finished, its fine. But if there are any problems with the implementation, or more operations you need to perform, consider the virtual DOM approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Body comes in a text form. Then it gets transformed into MD -> HTML. So it is rendered as a text form. So typically when fetched from GitHub the body looks like
Resolves #23 <!-- comment -->
so I don't see a way to skip it other than removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTML comments shouldn't be included in element.textContent is my point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but there is no way to know it is a comment before converting it to HTML, since the MD renderer is ran first
https://github.com/gentlementlegen/conversation-rewards/blob/cf5ecb6a9f1bb551fa01c866d6f8ccaefa7e1804/src/parser/formatting-evaluator-module.ts#L112
(it is actually the same for v1) so it is first converted to a p
that contains the comment.
@@ -139,17 +141,36 @@ export class FormattingEvaluatorModule implements Module { | |||
|
|||
for (const element of elements) { | |||
const tagName = element.tagName.toLowerCase(); | |||
const wordCount = this._countWords(this._multipliers[commentType].regex, element.textContent || ""); | |||
// We cannot use textContent otherwise we would duplicate counts, so instead we extract text nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
textContent of the top level parent element will do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, will be part of #92
score, | ||
}; | ||
logger.debug("Tag content results", { tagName, symbols, text: element.textContent }); | ||
// If we already had that tag included in the result, merge them and update total count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that for the statistics it might be interesting to count words per element but honestly its out of scope and doesn't add business value while complicating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be changed in #92
Resolves #82
QA: https://github.com/ubiquibot/conversation-rewards/actions/runs/10845258477/job/30095719413
Changes