This project reads .osu files and formats all mapsets, maps, and objects into an sqlite database. There are also several sql files that analyse the database for curiosity satisfaction purposes.
The code is a bit of a bodge job so some things like the query scripts might not be ready to run out of the box.
Thanks to Mr HeliX for providing a list of loved beatmap IDs.
This is the most common circle in osu!
You should probably have Python installed for this, I used 3.9. Sqlite3 is probably a good idea too.
- Clone this repository and navigate to it using your favourite command line interface with
cd osu.map-data
- Create a new database file with
sqlite3 mapdata.db
and then set up the tables with.read schema.sql
. PressCtrl+D
to close the database when you're done. - Create a new directory named
maps
and add your .osu files in here. If you need a dump of all ranked and loved .osu files you can get them from https://data.ppy.sh/. - Run
python3 run.py
to start reading and adding maps to the database. This will probably take some time (took me around an hour for the entire data dump), and will create a database file around 2GB in size.
When you're done you should have a database with four tables you're free to play around with.
Hopefully most of these tables and columns should be self-explanatory.
Column | Data type | Settings | Notes |
---|---|---|---|
beatmap_set_id | text | primary key | Very old beatmaps didn't store the beatmap set id in the .osu, so for these cases we use a hash of a string combining the metadata of the map. |
title | text | not null | Uses the first instance found of the set. |
title_unicode | text | not null | Uses the first instance found of the set. |
artist | text | not null | Uses the first instance found of the set. |
artist_unicode | text | not null | Uses the first instance found of the set. |
creator | text | not null | Mapset host. Uses the first instance found of the set. |
source | text | Uses the first instance found of the set. | |
tags | text | Uses the first instance found of the set. |
Column | Data type | Settings | Notes |
---|---|---|---|
beatmap_id | integer | primary key | Very old beatmaps didn't store the beatmap id in the .osu, so for these cases we use the file name the map was read from. |
beatmap_set_id | text | not null, foreign key | |
mode | integer | not null | Can only be 0 . We're only processing standard mode maps for now |
difficulty_name | text | not null | |
hp_drain | real | not null | |
circle_size | real | not null | |
overall_difficulty | real | not null | |
approach_rate | real | not null | |
slider_multiplier | real | not null | Base slider velocity. |
slider_tick_rate | real | not null | |
stack_leniency | real | not null |
Column | Data type | Settings | Notes |
---|---|---|---|
object_number | integer, primary key | not null | Goes up in the order that they appear in the .osu file. |
beatmap_id | integer, primary key, foreign key | not null | |
type | text | not null | circle , slider , or spinner |
time | integer | not null | Time the object appears in milliseconds. |
x | integer | not null | x co-ordinate of the object. |
y | integer | not null | y co-ordinate of the object. |
new_combo | integer | not null | 1 if the object is the start of a new combo, 0 otherwise. |
anchors | text | List of all slider anchors on this object. See this for more info. | |
length | real | Length of the slider in osu! pixels. See this for more info. | |
curve_type | text | Slider curve type. See this for more info. | |
slides | integer | Number of slides in the slider. See this for more info. |
This table only contains two columns for now, as it's just a bodge job to store loved beatmap IDs. Will probably re-use this to store all beatmap website data later.
Column | Data type | Settings | Notes |
---|---|---|---|
beatmap_id | integer, primary key, foreign key | not null | Only realised I named this column incorrectly whilst writing this documentation, but I'm too lazy to fix it now. |
status | text | not null | loved is the only value this can be for now. |
Feel free to write whatever queries to get interesting info from the database. I've provided a handful I used in the video in the scripts
directory, which you can run by opening the database with sqlite3 mapdata.db
and entering .read scripts/objectCoordsByFreq.sql
.
There are also two python scripts in app/queries
that I used for counting which co-ordinates were the least used. These can be simply run with python3 app/queries/coordsWithFewObjects.py
.
There is one more script called app/heatmap.py
that uses matplotlib to create the heatmaps of object frequency. This code reads .csv outputs made from the queries objectCoords
and firstObjectCoords
and formats them to make a heatmap.