Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some words with zero-width non-joiner (zwnj) for Farsi #21903

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

nshayanfar
Copy link

@nshayanfar nshayanfar commented Dec 7, 2024

Context

There are many errors in the file. I will try to fix them in the future slowly. The most important issues are these:

  • In CORRECT Farsi, zwnj (zero-width non-joiner) is used extensively but many people (being lazy) use a space instead of that. Nowadays and especially with copies created by AI, zwnj is spotted frequently. Therefor a lot of these words should be both present with zwnj and space to be correctly identified.
  • Some strings consisted of multiple words without any spaces between them. This is probably a mistake by someone not familiar with the language. It's understandable but wrong.
  • Almost always and unlike English, we don't write vowels in the words (they are simply omitted and inferred). Some of the words contained vowels and still contains vowels. These should be removed. This also sometimes extends to some non-vowels.

Summary

This PR can be summarized in the following changelog entry:

  • Improves keyphrase recognition in Farsi by updating the function words list

Relevant technical choices:

  • Added some adjectives and adverbs
  • Added some auxiliary verbs
  • Added some more popular forms of intensifiers
  • Removed non-written ی in prepositions

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

  • Create a post of at least 300 words. You can use this one as an example:
فیلم «امیلیا پرز» با نامزدی در ۱۰ رشته موفق‌ترین فیلم گلدن گلوب امسال بود. زوئی سالدانا که پیش‌تر با بازی در آواتار هم ستایش شده بود امشب جایزه بهترین هنرپیشه زن مکمل را برای بازی در «امیلیا پرز» دریافت کرد.

دانه‌ انجیر معابد به کارگردانی محمد رسول‌اف که یکی از پنج نامزد اصلی جایزه بهترین فیلم غیرانگلیسی زبان بود داستان یک بازپرس دادگاه انقلاب ایران است که در بحبوحه اعتراض‌های سراسری ۱۴۰۱ اسلحه‌اش را گم می‌کند و درگیر بحران در زندگی شخصی و کاری می‌شود.

فیلمی که از نگاه ناظران تکان‌دهنده‌ و راوی احوال این روزهای جامعه ایران است. فیلم از زاویه زندگی یک «بازپرس سرسپرده حکومت» روایت می‌شود که خشونت و جنایت‌هایش به دل خانه و خانواده خودش بازمی‌گردد.

این فیلم داستان مردی به نام ایمان را روایت می‌کند که بیست و یک سال در خدمت حکومت بوده و حالا با ترفیع به مقام بازپرس دادگاه انقلاب رسیده و بنابراین صاحب خانه‌ای سه‌خوابه و امکانات مالی بهتری خواهد شد. اما اولین روزهای آغاز کارش مصادف است با جنبش زن، زندگی، آزادی.

هشتاد و دومین مراسم سالانه گلدن گلوب اولین مراسم مهم فصل جوایز فیلم‌های سینمایی ۲۰۲۵ است که با اعطای جوایز اسکار در دوم مارس امسال به اوج خود می‌رسد.

مراسم گلوب امسال در هتل بورلی هیلتون لس‌آنجلس عصر یکشنبه ۵ ژانویه، ۱۶ دی برگزار شد.

برنده شدن در گلدن گلوب می‌تواند به تقویت وجهه یک فیلم درست در زمانی کمک کند که رای‌دهندگان بفتا و اسکار برای پر کردن برگه‌های نامزدی آماده می‌شوند. با این حال، گلدن گلوب نسبت به جوایز اسکار رسمیت کمتری دارد.
  • Add these function words as the keyphrase مقدار زیادی, باید, تعداد بسیار کم, فقط.
  • Confirm that the Function words in keyphrase assessment returns with grey bullet and says Your keyphrase X contains function words only. Learn more about what makes a good keyphrase.
  • Add ارزش as the keyphrase
  • Add مقدار بسیار کم ارزش as the keyphrase title
  • Make sure the keyphrase in SEO title assessment gives the following feedback: The exact match of the focus keyphrase appears at the beginning of the SEO title. Good job!

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes #

Added some adjectives and adverbs
Added some auxiliary verbs
Added some more populat forms of intensifiers
Removed nonwritten ی in prepositions
@nshayanfar nshayanfar changed the title Added some words with zero-width non-joiner (zwnj) Added some words with zero-width non-joiner (zwnj) for Farsi Dec 9, 2024
@mhkuu
Copy link
Contributor

mhkuu commented Dec 10, 2024

Thanks @nshayanfar for your pull request! Our team will have a look soon and will return to you if they have any questions.

@hannaw93 hannaw93 self-requested a review January 7, 2025 07:51
@hannaw93 hannaw93 self-assigned this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants