Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Flux and Flux Controlnet Support to Diffusion folder (#11794)
* Vae added and matched flux checkpoint Signed-off-by: mingyuanm <[email protected]> * Flux model added. Signed-off-by: mingyuanm <[email protected]> * Copying FlowMatchEulerScheduler over Signed-off-by: mingyuanm <[email protected]> * WIP: Start to test the pipeline forward pass Signed-off-by: mingyuanm <[email protected]> * Vae added and matched flux checkpoint Signed-off-by: mingyuanm <[email protected]> * Inference pipeline runs with offloading function Signed-off-by: mingyuanm <[email protected]> * Start to test image generation Signed-off-by: mingyuanm <[email protected]> * Decoding with VAE part has been verified. Still need to check the denoising loop. Signed-off-by: mingyuanm <[email protected]> * The inference pipeline is verified. Signed-off-by: mingyuanm <[email protected]> * Add arg parsers and refactoring Signed-off-by: mingyuanm <[email protected]> * Tested on multi batch sizes and prompts. Signed-off-by: mingyuanm <[email protected]> * Add headers Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Renaming Signed-off-by: mingyuanm <[email protected]> * Move shceduler to sampler folder Signed-off-by: mingyuanm <[email protected]> * Merging folders. Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Tested after path changing. Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Move MMDIT block to NeMo Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Add joint attention and single attention to NeMo Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Joint attention updated Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Remove redundant importing Signed-off-by: mingyuanm <[email protected]> * Refactor to inherit megatron module Signed-off-by: mingyuanm <[email protected]> * Adding mockdata Signed-off-by: mingyuanm <[email protected]> * DDP training works Signed-off-by: mingyuanm <[email protected]> * Added flux controlnet training components while not tested yet Signed-off-by: mingyuanm <[email protected]> * Flux training with DDP tested on 1 GPU Signed-off-by: mingyuanm <[email protected]> * Flux and controlnet now could train on precached mode. Signed-off-by: mingyuanm <[email protected]> * Custom FSDP path added to megatron parallel. Signed-off-by: mingyuanm <[email protected]> * Bug fix Signed-off-by: mingyuanm <[email protected]> * A hacky way to wrap frozen flux into FSDP to reproduce illegal memory issue. Signed-off-by: mingyuanm <[email protected]> * Typo Signed-off-by: mingyuanm <[email protected]> * Bypass the no grad issue when no single layers exists Signed-off-by: mingyuanm <[email protected]> * A hacky way to wrap frozen flux into FSDP to reproduce illegal memory issue. Signed-off-by: mingyuanm <[email protected]> * Let the flux model's dtype autocast before FSDP wrapping * fix RuntimeError: "Output 0 of SliceBackward0 is a view and is being modified inplace..." * Add a wrapper to flux controlnet so they are all wrapped into FSDP automatically Signed-off-by: mingyuanm <[email protected]> * Get rid of concat op in flux single transformer Signed-off-by: mingyuanm <[email protected]> * Get rid of concat op in flux single transformer Signed-off-by: mingyuanm <[email protected]> * single block attention.linear_proj.bias must not require grads after refactoring Signed-off-by: mingyuanm <[email protected]> * use cpu initialization to avoid OOM Signed-off-by: mingyuanm <[email protected]> * Set up flux training script with tp Signed-off-by: mingyuanm <[email protected]> * SDXL fid image generation script updated. Signed-off-by: mingyuanm <[email protected]> * Mcore self attention API changed Signed-off-by: mingyuanm <[email protected]> * Add a dummy task encoder for raw image inputs Signed-off-by: mingyuanm <[email protected]> * Support loading crudedataset via energon dataloader Signed-off-by: mingyuanm <[email protected]> * Default save last to True Signed-off-by: mingyuanm <[email protected]> * Add controlnet inference pipeline Signed-off-by: mingyuanm <[email protected]> * Add controlnet inference script Signed-off-by: mingyuanm <[email protected]> * Image resize mode update Signed-off-by: mingyuanm <[email protected]> * Remove unnecessary bias to avoid sharding issue. Signed-off-by: mingyuanm <[email protected]> * Handle MCore custom fsdp checkpoint load (#11621) * general handle custom_fsdp checkpoint load * Apply isort and black reformatting Signed-off-by: shjwudp <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: shjwudp <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: shjwudp <[email protected]> Co-authored-by: artbataev <[email protected]> * Checkpoint naming Signed-off-by: mingyuanm <[email protected]> * Image logger WIP Signed-off-by: mingyuanm <[email protected]> * Image logger works fine Signed-off-by: mingyuanm <[email protected]> * save hint and output to image logger. Signed-off-by: mingyuanm <[email protected]> * Update flux controlnet training step Signed-off-by: mingyuanm <[email protected]> * Add model connector and try to load from dist ckpt but failed. Signed-off-by: mingyuanm <[email protected]> * Renaming and refactoring submodel configs for nemo run compatibility Signed-off-by: mingyuanm <[email protected]> * Nemo run script works for basic testing recipe Signed-off-by: mingyuanm <[email protected]> * Added tp2 training factory Signed-off-by: mingyuanm <[email protected]> * Added convergence recipe Signed-off-by: mingyuanm <[email protected]> * Added flux training scripts Signed-off-by: mingyuanm <[email protected]> * Inference script tested Signed-off-by: mingyuanm <[email protected]> * Controlnet inference script tested Signed-off-by: mingyuanm <[email protected]> * Moving scripts to correct folder and modify headers Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Doc strings update Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * pylint correction Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Add import guard since custom fsdp is not merged to mcore yet Signed-off-by: mingyuanm <[email protected]> * Add copy right headers and correct code check Signed-off-by: mingyuanm <[email protected]> * Apply isort and black reformatting Signed-off-by: Victor49152 <[email protected]> * Code Scan Signed-off-by: mingyuanm <[email protected]> * Minor fix Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Victor49152 <[email protected]> Signed-off-by: shjwudp <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: Victor49152 <[email protected]> Co-authored-by: jianbinc <[email protected]> Co-authored-by: shjwudp <[email protected]> Co-authored-by: artbataev <[email protected]>
- Loading branch information