Skip to content

Free Open-Source Japanese and Korean language learning portal. Allows for Self-Hosting. MIT License.

License

Notifications You must be signed in to change notification settings

tristcoil/hanabira.org

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hanabira.org - Your Path to Japanese Comprehension

Free, Open-Source, Self-Hostable

Hanabira.org is a free and (mostly) open-source Japanese learning platform designed to help you prepare for JLPT N5-N1. Our platform offers various tools to enhance your Japanese language learning experience, from text parsing and grammar explanations to YouTube immersion and kanji mnemonics. Can be easily Self-Hosted. Is in very early Alpha stage, full of bugs. Korean content to be added soon(ish). Tech stack is: NextJS 14, Tailwind CSS, Shad CN, MongoDB, Express, Flask, Docker. MIT License. Code is provides "as is" without any warranty. Use at your own risk.

Features

  • YouTube Immersion - Enhance learning with engaging video content.
  • Text Parser - Easily split and tokenize custom texts.
  • Grammar Explanation - Quick and clear grammar points with examples.
  • Word relations - Graph with hierarchy of word relations, eg. synonyms.
  • Vocabulary SRS Cards - Effective spaced repetition flashcards with audio.
  • Vocabulary and Sentence Mining - Discover new words and sentences seamlessly.
  • Kanji Mnemonics (in development) - Simplified kanji learning techniques.
  • Kanji Animation and Drawing Canvas (in development) - Interactive kanji practice tools.

Screenshots

Hanabira Project

Hanabira Project Screenshot

Text Parser

Text Parser Screenshot

YouTube Immersion

YouTube Immersion Screenshot

Grammar Graph

Grammar Graph Screenshot

Grammar Explanations

Grammar Explanations Screenshot

Word Relations

Word Relations Screenshot

Self-Hosting

Hanabira (https://hanabira.org/) can be easily run locally or on your server/laptop just with 3 commands. To get started quickly, you can run Hanabira public Docker images. Images/containers are big (several GB each), unoptimized and run under root user. Use at your own risk.

  1. Quick Start

Use clean VirtualBox Linux Ubuntu based VM machine.

Start pre-made public hanabira containers.

git clone https://github.com/tristcoil/hanabira.org.git 
cd hanabira.org 
docker-compose up

Hanabira will be then accessible locally on: http://localhost:8888/

If you cannot reach the website locally, we recommend to check if all containers are running and to clear browser cache. Sometimes new release will have breaking change in database (during early development), in such case delete user_data directory in the same directory where docker-compose.yml file is located.

Optional: Check docker-compose.yml file for path to configs where you can insert your DEEPL, OpenAI and Google analytics API keys and tracking codes.

Self hosted Hanabira in Virtual Box VM: Hanabira Self Hosted in Virtual Box VM

Build (and run) containers yourself locally:

git clone https://github.com/tristcoil/hanabira.org.git 
cd hanabira.org 
docker-compose build
docker-compose up

These will miss audio files though, since audio is not part of the repo.

Note: Hanabira project has main upstream private repo, the public one contains only individual releases (not day-to-day dev progress).

Contact

For more information, visit Hanabira.org.

Hanabira Discord

Sources & Literature

Japanese

  • Nihongo So Matome JLPT N2 series
  • Nihongo So Matome JLPT N3 series
  • Nihongo So Matome JLPT N4 series
  • Nihongo So Matome JLPT N5 series
  • 600 Basic Japanese Verbs, Tuttle Publishing
  • New Kanzen Master JLPT N3 Tango Word Book (Shin Kanzen Master: JLPT N3 1800 Important Vocabulary Words)

Vietnamese

  • Let's speak Vietnamese (Binh Nhu Ngo)
  • Vietnamese as a second language (Hue Van Nguyen)

Web Sources

Japanese

JLPT level vocabulary lists taken from Tanos.co.uk

(Eventually, we will also use Kanji JLPT lists)
Licence: Creative Commons BY - License Details


Kanjidic Project

We are using the kanji dictionary from the KANJIDIC Project.

We took the KANJIDIC2 file, which is in XML format, encoded in Unicode/UTF-8, and contains information about all 13,108 kanji. You can download the file here.

After downloading, we extract the file to XML format. Then, we use our custom Python script to convert it to a JSON file for easier processing. The resulting JSON file is approximately 50 MB in size.

Radicals - RADKFILE
For more information on RADKFILE, visit this page.

Copyright
The RADKFILE and KRADFILE files are copyrighted and available under the EDRDG Licence. The copyright for RADKFILE2 and KRADFILE2 is held by Jim Rose and Jim Breen.

Please note that the licence might not allow commercial use. You can read more about the licence here.

License Information

The dictionary files are made available under a Creative Commons Attribution-ShareAlike Licence (V4.0).

The RADKFILE/KRADFILE files relate to the decomposition of the 6,355 kanji in JIS X 0208 into their visible components. However, please note that the RADKFILE2/KRADFILE2 files, which are copyrighted by Jim Breen, are not being used in our project.


Mecab

We are using the Mecab package available through the apt package manager.

Additionally, we are utilizing mecab-async, an NPM package licensed under the MIT License.

KUROSHIRO Parser

The KUROSHIRO Parser is a powerful tool for converting Japanese text into various forms. For more details, visit the official website.

The source code is available on GitHub at github.com/hexenq/kuroshiro.

KUROSHIRO is a Node.js package and is licensed under the MIT License.


JMDICT

The JMDict files are available under a Creative Commons Attribution-ShareAlike Licence (V4.0). You can view the Licence Deed and the full Licence Code.

For the EDICT, JMdict, and KANJIDIC files, you may use or quote the following URLs:

Unfortunately, we encountered issues downloading files from these older sites due to errors. However, we found a repository under the MIT License for JMDict (used for Yomitan) that is frequently updated. You can check it out here.

We downloaded the JMDict file from that repository, which does not include example sentences from Tatoeba. In the future, we may download the larger file as well.

Licence (JMDict for Yomitan)
The code in the JMDict for Yomitan repository is licensed under the MIT License. The released dictionaries are licensed under the Creative Commons Attribution-ShareAlike Licence (V4.0), the same as JMdict.


Radicals + KRADFILE

The meanings of the radicals used in our project are sourced from Wikipedia. You can view the full list of kanji radicals by stroke count here.

KRADFILE
We are using the KRADFILE for our project. More information about KRADFILE can be found here.

The RADKFILE and KRADFILE files are copyrighted and available under the EDRDG Licence. The copyright for RADKFILE2 and KRADFILE2 is held by Jim Rose. However, we are only using KRADFILE (not KRADFILE2), so we are in compliance with the licence.

For more information on the EDRDG licence, you can visit this link.

Sample attribution texts for using these files under the licence can be found here.


JAMDICT

JAMDICT is a Python package for working with Japanese dictionary files. It is licensed under the MIT License.

For more information, you can visit the package page on PyPI here.

The source code and additional details can be found on GitHub here.


Kanji Radicals

List of Kanji Radicals sourced from Wikipedia.


Pictures are taken from unsplash.com.