Problems running kohya #741
Unanswered
organfreeman36
asked this question in
Q&A
Replies: 2 comments 1 reply
-
Run setup.bat again. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I don't see the CUDA requirements in the main page but I think is needed. I've installed it at some point before installing this repository so maybe it found it and didn't throw an error for me. I don't have the link anymore but you need to go to nvidia and find this file (or a more recent one)
is a 3Gb file with CUDA stuff inside |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I'm trying to run Kohya to train a lora model with no success. I'm running windows 10 and have a GTX 1070. I've tried reinstalling serval time and checking all the dependencies are properly installed. and am getting the following error when running:
Microsoft Windows [Version 10.0.19044.2846]
(c) Microsoft Corporation. All rights reserved.
E:\Koyah\kohya_ss>gui.bat --listen 127.0.0.1 --server_port 7860 --inbrowser
'.\venv\Scripts\activate.bat' is not recognized as an internal or external command,
operable program or batch file.
System Information:
System: Windows, Release: 10, Version: 10.0.19044, Machine: AMD64, Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
Python Information:
Version: 3.10.9, Implementation: CPython, Compiler: MSC v.1934 64 bit (AMD64)
Virtual Environment Information:
Not running inside a virtual environment.
GPU Information:
Name: NVIDIA GeForce GTX 1070, VRAM: 8192 MiB
Validating that requirements are satisfied.
All requirements satisfied.
headless: False
Load CSS...
Running on local URL: http://127.0.0.1:7860
To create a public link, set
share=True
inlaunch()
.Folder 100_Archie : 5000 steps
max_train_steps = 5000
stop_text_encoder_training = 0
lr_warmup_steps = 500
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:/stable-diffusion/Lora Training Data/Archie/image" --resolution=512,512 --output_dir="E:/stable-diffusion/Lora Training Data/Archie/model" --logging_dir="E:/stable-diffusion/Lora Training Data/Archie/log" --save_model_as=safetensors --output_name="last" --max_data_loader_n_workers="0" --learning_rate="1e-05" --lr_scheduler="cosine" --lr_warmup_steps="500" --train_batch_size="1" --max_train_steps="5000" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale
prepare tokenizer
prepare images.
found directory E:\stable-diffusion\Lora Training Data\Archie\image\100_Archie contains 50 image files
5000 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0]
image_dir: "E:\stable-diffusion\Lora Training Data\Archie\image\100_Archie"
image_count: 50
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Archie
caption_extension: .caption
[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 206.52it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (320, 576), count: 400
bucket 1: resolution (320, 640), count: 100
bucket 2: resolution (320, 704), count: 200
bucket 3: resolution (320, 768), count: 200
bucket 4: resolution (384, 448), count: 100
bucket 5: resolution (384, 512), count: 300
bucket 6: resolution (384, 576), count: 800
bucket 7: resolution (448, 448), count: 700
bucket 8: resolution (448, 512), count: 1300
bucket 9: resolution (448, 576), count: 500
bucket 10: resolution (512, 384), count: 100
bucket 11: resolution (512, 448), count: 100
bucket 12: resolution (512, 512), count: 200
mean ar error (without repeats): 0.027661735600387
prepare accelerator
C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py:249: FutureWarning:
logging_dir
is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Useproject_dir
instead.warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models
safety_checker\model.safetensors not found
Fetching 19 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing
safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:13<00:00, 3.59it/s]
prepare optimizer, data loader etc.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
Traceback (most recent call last):
File "E:\Koyah\kohya_ss\train_db.py", line 469, in
train(args)
File "E:\Koyah\kohya_ss\train_db.py", line 152, in train
_, , optimizer = train_util.get_optimizer(args, trainable_params)
File "E:\Koyah\kohya_ss\library\train_util.py", line 2517, in get_optimizer
import bitsandbytes as bnb
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes_init.py", line 6, in
from .autograd._functions import (
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
cls.instance.initialize()
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 31, in initialize
self.lib = ct.cdll.LoadLibrary(binary_path)
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 452, in LoadLibrary
return self.dlltype(name)
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'WindowsPath' is not iterable
Traceback (most recent call last):
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command
simple_launcher(args)
File "C:\Users\Archie\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\Archie\AppData\Local\Programs\Python\Python310\python.exe', 'train_db.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:/stable-diffusion/Lora Training Data/Archie/image', '--resolution=512,512', '--output_dir=E:/stable-diffusion/Lora Training Data/Archie/model', '--logging_dir=E:/stable-diffusion/Lora Training Data/Archie/log', '--save_model_as=safetensors', '--output_name=last', '--max_data_loader_n_workers=0', '--learning_rate=1e-05', '--lr_scheduler=cosine', '--lr_warmup_steps=500', '--train_batch_size=1', '--max_train_steps=5000', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
Beta Was this translation helpful? Give feedback.
All reactions