Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: word count #118

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ Here is a possible valid configuration to enable this plugin. See [these files](
```yaml
plugin: ubiquibot/conversation-rewards
with:
logLevel: "info"
evmNetworkId: 100
evmPrivateEncrypted: "encrypted-key"
erc20RewardToken: "0xe91D153E0b41518A2Ce8Dd3D7944Fa863463a97d"
Expand Down
2 changes: 2 additions & 0 deletions src/configuration/incentives.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { StaticDecode, Type as T } from "@sinclair/typebox";
import { LOG_LEVEL } from "@ubiquity-dao/ubiquibot-logger";
import { StandardValidator } from "typebox-validators";
import { contentEvaluatorConfigurationType } from "./content-evaluator-config";
import { dataCollectionConfigurationType } from "./data-collection-config";
Expand All @@ -10,6 +11,7 @@ import { userExtractorConfigurationType } from "./user-extractor-config";

export const incentivesConfigurationSchema = T.Object(
{
logLevel: T.Enum(LOG_LEVEL, { default: LOG_LEVEL.INFO }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nice to get info within the logs by default because some interesting info is there during the process.

/**
* Network ID to run in, default to 100
*/
Expand Down
3 changes: 2 additions & 1 deletion src/helpers/logger.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { Logs } from "@ubiquity-dao/ubiquibot-logger";
import configuration from "../configuration/config-reader";

const logger = new Logs("debug");
const logger = new Logs(configuration.logLevel);

export default logger;
2 changes: 1 addition & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ export default run()
return result;
})
.catch(async (e) => {
const errorMessage = logger.error(`Failed to run comment evaluation. ${e.logMessage?.raw || e}`, e);
const errorMessage = logger.error(`Failed to run comment evaluation. ${e?.logMessage?.raw || e}`, e);
try {
await githubCommentModuleInstance.postComment(
`${errorMessage?.logMessage.diff}\n<!--\n${getGithubWorkflowRunUrl()}\n${JSON.stringify(errorMessage?.metadata, null, 2)}\n-->`
Expand Down
2 changes: 2 additions & 0 deletions src/parser/data-purge-module.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ export class DataPurgeModule implements Module {
.replace(/^>.*$/gm, "")
// Remove commands such as /start
.replace(/^\/.+/g, "")
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, "")
Copy link
Member

@0x4007 0x4007 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a virtual DOM creator like jsdom or mdast, we should be able to query element.textContent and it should handle this and other situations in a robust manner. Since this is already finished, its fine. But if there are any problems with the implementation, or more operations you need to perform, consider the virtual DOM approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Body comes in a text form. Then it gets transformed into MD -> HTML. So it is rendered as a text form. So typically when fetched from GitHub the body looks like
Resolves #23 <!-- comment --> so I don't see a way to skip it other than removing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML comments shouldn't be included in element.textContent is my point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there is no way to know it is a comment before converting it to HTML, since the MD renderer is ran first
https://github.com/gentlementlegen/conversation-rewards/blob/cf5ecb6a9f1bb551fa01c866d6f8ccaefa7e1804/src/parser/formatting-evaluator-module.ts#L112

(it is actually the same for v1) so it is first converted to a p that contains the comment.

// Keep only one new line needed by markdown-it package to convert to html
.replace(/\n\s*\n/g, "\n")
.trim();
Expand Down
35 changes: 28 additions & 7 deletions src/parser/formatting-evaluator-module.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ export class FormattingEvaluatorModule implements Module {
}

_getFormattingScore(comment: GithubCommentScore) {
const html = this._md.render(comment.content);
// Change the \r to \n to fix markup interpretation
const html = this._md.render(comment.content.replaceAll("\r", "\n"));
logger.debug("Will analyze formatting for the current content", { comment: comment.content, html });
const temp = new JSDOM(html);
if (temp.window.document.body) {
const res = this.classifyTagsWithWordCount(temp.window.document.body, comment.type);
Expand All @@ -118,7 +120,7 @@ export class FormattingEvaluatorModule implements Module {
}
}

_countWords(regexes: FormattingEvaluatorConfiguration["multipliers"][0]["rewards"]["regex"], text: string) {
_countSymbols(regexes: FormattingEvaluatorConfiguration["multipliers"][0]["rewards"]["regex"], text: string) {
const counts: { [p: string]: { count: number; multiplier: number } } = {};
for (const [regex, multiplier] of Object.entries(regexes)) {
const match = text.trim().match(new RegExp(regex, "g"));
Expand All @@ -139,17 +141,36 @@ export class FormattingEvaluatorModule implements Module {

for (const element of elements) {
const tagName = element.tagName.toLowerCase();
const wordCount = this._countWords(this._multipliers[commentType].regex, element.textContent || "");
// We cannot use textContent otherwise we would duplicate counts, so instead we extract text nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textContent of the top level parent element will do the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, will be part of #92

const textNodes = Array.from(element?.childNodes || []).filter((node) => node.nodeType === 3);
const innerText = textNodes
.map((node) => node.nodeValue?.trim())
.join(" ")
.trim();
const symbols = this._countSymbols(this._multipliers[commentType].regex, innerText);
let score = 0;
if (this._multipliers[commentType]?.html[tagName] !== undefined) {
score = this._multipliers[commentType].html[tagName];
} else {
logger.error(`Could not find multiplier for comment [${commentType}], <${tagName}>`);
}
tagWordCount[tagName] = {
symbols: wordCount,
score,
};
logger.debug("Tag content results", { tagName, symbols, text: element.textContent });
// If we already had that tag included in the result, merge them and update total count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that for the statistics it might be interesting to count words per element but honestly its out of scope and doesn't add business value while complicating the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be changed in #92

if (Object.keys(tagWordCount).includes(tagName)) {
for (const [k, v] of Object.entries(symbols)) {
if (Object.keys(tagWordCount[tagName].symbols).includes(k)) {
tagWordCount[tagName].symbols[k] = {
...tagWordCount[tagName].symbols[k],
count: tagWordCount[tagName].symbols[k].count + v.count,
};
}
}
} else {
tagWordCount[tagName] = {
symbols: symbols,
score,
};
}
}

return tagWordCount;
Expand Down
39 changes: 24 additions & 15 deletions tests/__mocks__/results/content-evaluator-results.json
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 71,
"count": 69,
"multiplier": 0.2
}
}
Expand All @@ -77,7 +77,7 @@
"multiplier": 1
},
"relevance": 0.8,
"reward": 6.28
"reward": 6.136
},
"type": "ISSUE_AUTHOR",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22#issuecomment-1949203681"
Expand Down Expand Up @@ -164,7 +164,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 2,
"count": 4,
"multiplier": 0.1
}
}
Expand All @@ -182,7 +182,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 35,
"count": 3,
"multiplier": 0.1
}
}
Expand All @@ -191,7 +191,7 @@
"score": 0,
"symbols": {
"\\b\\w+\\b": {
"count": 52,
"count": 1,
"multiplier": 0.1
}
}
Expand All @@ -200,7 +200,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 20,
"count": 54,
"multiplier": 0.1
}
}
Expand All @@ -209,7 +209,7 @@
"multiplier": 1
},
"relevance": 1,
"reward": 3.51
"reward": 3.55
},
"type": "ISSUE_SPECIFICATION",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22"
Expand Down Expand Up @@ -281,7 +281,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 1,
"count": 205,
"multiplier": 0.1
}
}
Expand All @@ -299,7 +299,16 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 641,
"count": 441,
"multiplier": 0.1
}
}
},
"pre": {
"score": 0,
"symbols": {
"\\b\\w+\\b": {
"count": 2,
"multiplier": 0.1
}
}
Expand All @@ -308,7 +317,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 4,
"count": 1,
"multiplier": 0.1
}
}
Expand All @@ -317,13 +326,13 @@
"multiplier": 1
},
"relevance": 1,
"reward": 25.02
"reward": 27.3
},
"type": "PULL_COLLABORATOR",
"url": "https://github.com/ubiquibot/comment-incentives/pull/25#issuecomment-1949196678"
}
],
"total": 46.472,
"total": 48.648,
"userId": 4975670
},
"gitcoindev": {
Expand Down Expand Up @@ -453,7 +462,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 15,
"count": 16,
"multiplier": 0.1
}
}
Expand All @@ -462,7 +471,7 @@
"multiplier": 0.25
},
"relevance": 0.8,
"reward": 0.2
"reward": 0.208
},
"type": "ISSUE_CONTRIBUTOR",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22#issuecomment-1948916343"
Expand Down Expand Up @@ -636,7 +645,7 @@
"url": "https://github.com/ubiquibot/comment-incentives/pull/25#issuecomment-1949044855"
}
],
"total": 1.368,
"total": 1.376,
"userId": 41552663
}
}
39 changes: 24 additions & 15 deletions tests/__mocks__/results/formatting-evaluator-results.json
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,15 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 71,
"count": 69,
"multiplier": 0.2
}
}
}
},
"multiplier": 1
},
"reward": 7.85
"reward": 7.67
},
"type": "ISSUE_AUTHOR",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22#issuecomment-1949203681"
Expand Down Expand Up @@ -158,7 +158,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 2,
"count": 4,
"multiplier": 0.1
}
}
Expand All @@ -176,7 +176,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 35,
"count": 3,
"multiplier": 0.1
}
}
Expand All @@ -185,7 +185,7 @@
"score": 0,
"symbols": {
"\\b\\w+\\b": {
"count": 52,
"count": 1,
"multiplier": 0.1
}
}
Expand All @@ -194,15 +194,15 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 20,
"count": 54,
"multiplier": 0.1
}
}
}
},
"multiplier": 1
},
"reward": 3.51
"reward": 3.55
},
"type": "ISSUE_SPECIFICATION",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22"
Expand Down Expand Up @@ -272,7 +272,7 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 1,
"count": 205,
"multiplier": 0.1
}
}
Expand All @@ -290,7 +290,16 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 641,
"count": 441,
"multiplier": 0.1
}
}
},
"pre": {
"score": 0,
"symbols": {
"\\b\\w+\\b": {
"count": 2,
"multiplier": 0.1
}
}
Expand All @@ -299,21 +308,21 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 4,
"count": 1,
"multiplier": 0.1
}
}
}
},
"multiplier": 1
},
"reward": 25.02
"reward": 27.3
},
"type": "PULL_COLLABORATOR",
"url": "https://github.com/ubiquibot/comment-incentives/pull/25#issuecomment-1949196678"
}
],
"total": 50.62,
"total": 52.76,
"userId": 4975670
},
"gitcoindev": {
Expand Down Expand Up @@ -439,15 +448,15 @@
"score": 1,
"symbols": {
"\\b\\w+\\b": {
"count": 15,
"count": 16,
"multiplier": 0.1
}
}
}
},
"multiplier": 0.25
},
"reward": 0.25
"reward": 0.26
},
"type": "ISSUE_CONTRIBUTOR",
"url": "https://github.com/ubiquibot/comment-incentives/issues/22#issuecomment-1948916343"
Expand Down Expand Up @@ -614,7 +623,7 @@
"url": "https://github.com/ubiquibot/comment-incentives/pull/25#issuecomment-1949044855"
}
],
"total": 1.63,
"total": 1.64,
"userId": 41552663
}
}
Loading
Loading