Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Multi-Scale Training for Text Detection #1714

Open
wants to merge 3 commits into
base: dev-1.x
Choose a base branch
from

Conversation

Mountchicken
Copy link
Collaborator

Multiscale training is an attractive trick for text detection since the scale of text is highly variable.

Supporting multi-scale training is simple, we only need to modify the generation of text target to use data_sample.batch_input_shape instead of data_sample.img_shape. This modification will not affect the existing detectors in mmocr, because their input size is fixed, i.e. data_sample.img_shape=data_sample.batch_input_shape.

To use multi-scale training, here is a simple config

train_pipeline = [
    dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
    dict(
        type='LoadOCRAnnotations',
        with_bbox=True,
        with_polygon=True,
        with_label=True,
    ),
    dict(
        type='RandomResize',
        scale=[(1280, 800), (1280, 1024)],
        keep_ratio=True),
    dict(
        type='PackTextDetInputs',
        meta_keys=('img_path', 'ori_shape', 'img_shape'))
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants