Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data for card set information #76

Open
iconmaster5326 opened this issue Oct 6, 2023 · 8 comments
Open

Add data for card set information #76

iconmaster5326 opened this issue Oct 6, 2023 · 8 comments

Comments

@iconmaster5326
Copy link

I quite love that this data set has information about what sets what cards appeared in regardless of locale. A lot of Yugioh tools I've found care only about the TCG, and usually only the English TCG at that. However, I've run into a big issue when trying to pin cards to certain sets, considering all possible locales:

  1. Sometimes, two sets from different locales that have the same name refers to the same set (i.e. Legacy of Darkness)
  2. Sometimes, two sets from different locales that have the same name refers to a different set (i.e. Pharaoh's Servant)
  3. Sometimes, two sets from different locales that have the same name refers to the same set, BUT the sets have different contents across locales (i.e. Legend of Blue-Eyes White Dragon)

Furthermore, there is some data I'd like regarding sets, such as when a set first came out in each locale. Have you considered making an additional dataset for card sets?

@iconmaster5326
Copy link
Author

Another wrinkle regarding locales, should anyone try to implement this: The language a card is printed in may not be the locale the set was released in, and any given set may contain cards of multiple different languages. Case in point: https://yugipedia.com/wiki/The_Valuable_Book_5_promotional_cards.

@kevinlul
Copy link
Contributor

kevinlul commented Oct 6, 2023

See DawnbrandBots/yaml-yugipedia#2. There are a number of things to be resolved. For one, I'm merging all of Yugipedia's English sets into one "sets.en" array, but there's actually significant regionalization, at least historically (North America, Europe, Oceania, Worldwide), which is why Yugipedia has separate en_sets, na_sets, and so on.

@iconmaster5326
Copy link
Author

Oh no, I've made a duplicate issue! I don't know how I missed you talking about this already. Yeah, I can imagine this is no easy task, considering the great historical muddling of locales, and the general non-alignment of languages to locale.

@kevinlul
Copy link
Contributor

kevinlul commented Oct 6, 2023

It's not exactly duplicate since there's no tracking issue in this repository, and I also haven't really started to document the use cases and caveats that need to be worked out. Typically the main goal is just first and last release date, which we do need for Bastion, but most approaches mask over different release timings by region (TCG release date just becomes US release date, OCG release date just becomes Japan release date). I was unaware that the same name could refer to different sets, or the VB5 language issue, so thanks for bringing that up.

@iconmaster5326
Copy link
Author

My thoughts, if they're worth anything at all: Luckily, if you're scraping the Yugipedia card pages to get printing information, you have the easy ability to correspond cards to sets, cleanly avoiding the above issues, by just looking at the link given in each "set" column. If the trouble with set information is just rounding up a full list of every set ever, you could even use those links to enumerate all sets that have ever had a card printed in them. (this method would not catch sets without cards in them, but... uh... yeah). You can even avoid scraping the set page itself for things like dates; you can just review all the dates printed in the printing tables for any given link... Although if you find a set whose individual cards don't agree on date, good luck sorting that out.

@kevinlul
Copy link
Contributor

kevinlul commented Oct 6, 2023

We're not supposed to be scraping the pages themselves, but instead obtain the wikitext from the API, which means some of the logic we have to reimplement ourselves. That's why the current plan is to recurse on Category:Sets and hopefully nab everything, then go from there to try to assign dates. This is close to what the release table template does on wiki.

@iconmaster5326
Copy link
Author

iconmaster5326 commented Oct 6, 2023

Ah, right, the wikitext... Yugipedia just lists a list of set names and hopes for the best, automatically making links, instead of manually disambiguating. And so their problem of potential inaccuracies is now our problem.

EDIT: it actually looks like the string they use there corresponds 1:1 to a page name, no heuristics needed. So you're safe using that to disambiguate. Phew!

@kevinlul
Copy link
Contributor

kevinlul commented Oct 6, 2023

Dates can also be obtained from the official card database at https://www.db.yugioh-card.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants