Home | Introduction | Why MaskedConv2d? | Key Differences | Visual Comparison | Current Limitations | Getting Started | License | How to Cite
Masked Convolution for Diverse Sample Sizes designed to address the issues presented in typical convolution operations with varying image sizes and hardware acceleration constraints. This project is inspired by the innovative ideas from the paper "Partial Convolution based Padding" by Guilin Liu et al.
In the realm of convolution operations, the main advantage of handling images of different sizes is often compromised due to the limitations imposed by GPU accelerations which necessitate larger batch sizes. Several solutions have emerged, though none without their drawbacks. The MaskedConv2d
layer provided in this repository is refined to support better statistical handling and normalization of multi-channel masks without altering output distribution as seen in conventional methods.
PartialConv2d
layers, as discussed in various implementations, tend to disrupt the distribution of output values which can be detrimental in certain use cases. This repository offers MaskedConv2d
as a drop-in replacement to rectify these issues, promoting stability and efficiency in model training.
- Output Distribution: Unlike
PartialConv2d
,MaskedConv2d
ensures the output values remain normally distributed. - Mask Handling:
MaskedConv2d
utilizes 1x1 convolution weights for the mask, allowing for direct channel-wise scaling without the mask blurring effect seen inPartialConv2d
. - Versatility:
MaskedConv2d
can also act as a drop-in replacement for trained regular convolutional layers. - Efficiency:
MaskedConv2d
is optimized for no-mask scenarios which accelerates training without compromising on performance.
A comparative visualization of activation statistics (per-channel mean) between conventional convolution (conv), PartialConv (pconv), and MaskedConv (mconv). It highlights how MaskedConv2d
maintains a more consistent and stable distribution across the layers:
Reproduce this visualization by running visualize_activations_2d.py.
- Lazy module implementation is pending.
- Exporting to ONNX format might encounter issues due to the use of inplace operators.
To get started, clone this repository and integrate it into your PyTorch projects. Play with examples, such as:
-
autoregressive language model autoregressive_language_model.py
Attention visualization in autoregressive language model.
to see the benefits of using masked convolution.
This project is licensed under the Apache License, Version 2.0 (LICENSE-APACHE-2.0 or http://www.apache.org/licenses/LICENSE-2.0).
If you find this implementation useful in your research, please consider citing it:
@misc{mconv2024,
title = {Masked Convolution for Diverse Sample Sizes},
author = {Ivan Stepanov},
year = {2024},
howpublished = {\url{https://github.com/ivanstepanovftw/masked_torch}},
note = {Accessed: April 30, 2024}
}