Kohya_SS Issues

MrDaisyBates · 2 March 2024 13:33

Yet another edit, previous versions below:

I have found the issue, but I don’t know how to solve it.

On the left, you’ll see I ran “python -m bitsandbites” and it finds my cuda stuff fine. On the right is the terminal output from when I ran terminal inside of my Kohya folder.

Kohya requires running with pyenv to shim the python version. I noticed when I ran the setup.sh file via right click → run as program, it would fail to detect my shim. But when I run it via terminal inside the folder (the right image), it fails to detect my libraries, but it does successfully shim my python.

I don’t know how to fix this, though. If I run the setup without the shim, it seems to run, but it detects the wrong python version. But if I run it with the shim, it doesn’t seem to find my cuda libraries.

My kohya is on a different drive than my operating system. I’ve had it running successfully on this very drive before, so I don’t know why that would be an issue. Just wanted to put it out there.

Heavy Edit (Original Below Line Divider)

I’ve continued to tinker with it, and I indeed think that Kohya is having a hard time pathing to the Cuda stuff. In the setup.sh file (available on the linked github below), there does seem to be lots of references to the LD_LIBRARY_PATH. I also tried installing an older version of Kohya and the outputs were mere helpful regarding the error. It did seem to suggest it was having a hard time locating some Cuda libs or something. Again, a bit over my head. I have reinstalled the current version, but this time, I did it all within one terminal window, therefore it will all be streamlined in one location. I don’t understand code or python at all. I’m not sure what it takes to update a path for the variable, let alone what path I’m even looking for?

    /run/me/h/S/P/kohya_ss    master  python -V                      ✔ 
Python 3.10.11
    /run/me/h/S/P/kohya_ss    master  pip install bitsandbytes       ✔ 
Collecting bitsandbytes
  Downloading bitsandbytes-0.42.0-py3-none-any.whl.metadata (9.9 kB)
Collecting scipy (from bitsandbytes)
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.4/60.4 kB 2.5 MB/s eta 0:00:00
Requirement already satisfied: numpy<1.29.0,>=1.22.4 in /home/hattivatti/.pyenv/versions/3.10.11/lib/python3.10/site-packages (from scipy->bitsandbytes) (1.26.4)
Downloading bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.0/105.0 MB 12.3 MB/s eta 0:00:00
Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.4/38.4 MB 14.8 MB/s eta 0:00:00
Installing collected packages: scipy, bitsandbytes
Successfully installed bitsandbytes-0.42.0 scipy-1.12.0
    /run/me/h/S/P/kohya_ss    master  python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++


+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++


++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++

++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['6.1']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable


WARNING: Please be sure to sanitize sensible info from any such env vars!

SUCCESS!
Installation was successful!
    /run/me/h/S/P/kohya_ss    master  ./setup.sh                                      ✔  4s  
Skipping git operations.
Arch Linux or Arch base detected.
Python TK found...
Switching to virtual Python environment.
19:54:29-610502 INFO     Installing python dependencies. This could take a few minutes as it downloads  
                         files.                                                                         
19:54:29-614328 INFO     If this operation ever runs too long, you can rerun this script in verbose mode
                         to check.                                                                      
19:54:29-615335 INFO     Version: v22.6.2                                                               
                                                                                                        
19:54:29-616238 INFO     Python 3.10.11 on Linux                                                        
19:54:29-617529 INFO     Installing modules from requirements_linux.txt...                              
19:54:29-618532 INFO     Installing package: torch==2.0.1+cu118 torchvision==0.15.2+cu118               
                         --extra-index-url https://download.pytorch.org/whl/cu118                       
19:55:39-781275 INFO     Installing package: xformers==0.0.21 bitsandbytes==0.41.1                      
19:55:45-267134 INFO     Installing package: tensorboard==2.14.1 tensorflow==2.14.0                     
19:56:08-282542 INFO     Installing modules from requirements.txt...                                    
19:56:08-283754 INFO     Installing package: accelerate==0.25.0                                         
19:56:10-763052 INFO     Installing package: aiofiles==23.2.1                                           
19:56:11-647159 INFO     Installing package: altair==4.2.2                                              
19:56:20-397948 INFO     Installing package: dadaptation==3.1                                           
19:56:21-509331 INFO     Installing package: diffusers[torch]==0.25.0                                   
19:56:24-887533 INFO     Installing package: easygui==0.98.3                                            
19:56:26-474757 INFO     Installing package: einops==0.7.0                                              
19:56:27-679318 INFO     Installing package: fairscale==0.4.13                                          
19:56:29-208834 INFO     Installing package: ftfy==6.1.1                                                
19:56:30-475121 INFO     Installing package: gradio==3.50.2                                             
19:56:45-511466 INFO     Installing package: huggingface-hub==0.20.1                                    
19:56:47-431944 INFO     Installing package: invisible-watermark==0.2.0                                 
19:56:50-397830 INFO     Installing package: lion-pytorch==0.0.6                                        
19:56:51-675122 INFO     Installing package: lycoris_lora==2.0.2                                        
19:56:58-550373 INFO     Installing package: omegaconf==2.3.0                                           
19:57:00-571637 INFO     Installing package: onnx==1.14.1                                               
19:57:04-262950 INFO     Installing package: onnxruntime-gpu==1.16.0                                    
19:57:09-067969 INFO     Installing package: protobuf==3.20.3                                           
19:57:11-232247 INFO     Installing package: open-clip-torch==2.20.0                                    
19:57:14-037836 INFO     Installing package: opencv-python==4.7.0.68                                    
19:57:17-083098 INFO     Installing package: prodigyopt==1.0                                            
19:57:19-077856 INFO     Installing package: pytorch-lightning==1.9.0                                   
19:57:24-148369 INFO     Installing package: rich==13.7.0                                               
19:57:26-571948 INFO     Installing package: safetensors==0.4.2                                         
19:57:28-777800 INFO     Installing package: timm==0.6.12                                               
19:57:31-321192 INFO     Installing package: tk==0.1.0                                                  
19:57:33-402054 INFO     Installing package: toml==0.10.2                                               
19:57:35-498712 INFO     Installing package: transformers==4.36.2                                       
19:57:42-844890 INFO     Installing package: voluptuous==0.13.1                                         
19:57:44-873888 INFO     Installing package: wandb==0.15.11                                             
19:57:50-285120 INFO     Installing package: scipy==1.11.4                                              
19:57:55-679614 INFO     Installing package: -e .                                                       
19:58:00-715828 INFO     Configuring accelerate...                                                      
19:58:00-717231 WARNING  Could not automatically configure accelerate. Please manually configure        
                         accelerate with the option in the menu or with: accelerate config.             
Exiting Python virtual environment.
Setup finished! Run ./gui.sh to start.
Please note if you'd like to expose your public server you need to run ./gui.sh --share
    /run/me/h/S/P/kohya_ss    master  ./gui.sh                                    ✔  3m 36s  
Warning: LD_LIBRARY_PATH environment variable is not set.
Certain functionalities may not work correctly.
Please ensure that the required libraries are properly configured.
 
If you use WSL2 you may want to: export LD_LIBRARY_PATH=/usr/lib/wsl/lib/
 
20:00:30-508391 INFO     Version: v22.6.2                                                               
                                                                                                        
20:00:30-512935 INFO     nVidia toolkit detected                                                        
20:00:31-583852 INFO     Torch 2.0.1+cu118                                                              
20:00:31-593342 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700                                     
20:00:31-624325 INFO     Torch detected GPU: NVIDIA GeForce GTX 1080 Ti VRAM 11169 Arch (6, 1) Cores 28 
20:00:31-626174 INFO     Verifying modules installation status from                                     
                         /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/requirements_linux.txt
                         ...                                                                            
20:00:31-629306 INFO     Verifying modules installation status from requirements.txt...                 
20:00:34-926964 INFO     headless: False                                                                
20:00:34-930507 INFO     Load CSS...                                                                    
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
20:01:08-347220 INFO     Loading config...                                                              
20:01:09-312804 INFO     Loading config...                                                              
20:01:32-576599 INFO     Loading config...                                                              
20:01:33-550554 INFO     Start training Dreambooth...                                                   
20:01:33-552697 INFO     Valid image folder names found in:                                             
                         /run/media/hattivatti/1TBOverflow/ModelOutput/img                              
20:01:33-554702 INFO     Valid image folder names found in:                                             
                         /run/media/hattivatti/1TBOverflow/ModelOutput/reg                              
20:01:33-556590 INFO     Folder 2_M00seArt art style : steps 178                                        
20:01:33-558175 INFO     Regularisation images are used... Will double the number of steps required...  
20:01:33-559971 INFO     max_train_steps (178 / 2 / 1 * 20 * 2) = 3560                                  
20:01:33-561880 INFO     stop_text_encoder_training = 0                                                 
20:01:33-563325 INFO     lr_warmup_steps = 178                                                          
20:01:33-564879 INFO     Saving training config to                                                      
                         /run/media/hattivatti/1TBOverflow/ModelOutput/model/M00seArt_20240302-200133.js
                         on...                                                                          
20:01:33-567100 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_db.py"              
                         --bucket_no_upscale --bucket_reso_steps=64 --enable_bucket                     
                         --min_bucket_reso=384 --max_bucket_reso=1024 --learning_rate="0.0001"          
                         --learning_rate_te="1e-05"                                                     
                         --logging_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/log"              
                         --lr_scheduler="constant_with_warmup" --lr_scheduler_num_cycles="20"           
                         --lr_warmup_steps="178" --max_data_loader_n_workers="0" --resolution="512,512" 
                         --max_train_steps="3560" --mixed_precision="fp16" --optimizer_type="AdamW8bit" 
                         --output_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/model"             
                         --output_name="M00seArt"                                                       
                         --pretrained_model_name_or_path="/run/media/hattivatti/StableDiffusionS/Program
                         s/stable-diffusion-webui/models/Stable-diffusion/artUniverse_v80.safetensors"  
                         --random_crop                                                                  
                         --reg_data_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/reg"             
                         --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="fp16"  
                         --train_batch_size="2"                                                         
                         --train_data_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/img" --xformers
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2024-03-02 20:01:37.939688: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-02 20:01:37.939721: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-02 20:01:37.939743: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-02 20:01:37.945133: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-02 20:01:38.813610: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-02 20:01:39 INFO     prepare tokenizer                                        train_util.py:3959
2024-03-02 20:01:40 INFO     prepare images.                                          train_util.py:1469
                    INFO     found directory                                          train_util.py:1432
                             /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s                   
                             eArt art style contains 89 image files                                     
                    WARNING  No caption file found for 89 images. Training will       train_util.py:1459
                             continue without captions for these images. If class                       
                             token exists, it will be used. /                                           
                             89枚の画像にキャプションファイルが見つかりませんでした。                   
                             これらの画像についてはキャプションなしで学習を続行します                   
                             。class tokenが存在する場合はそれを使います。                              
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/1.png                                                       
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/10.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/11.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/12.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/13.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1464
                             eArt art style/14.png... and 84 more                                       
                    INFO     found directory                                          train_util.py:1432
                             /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art                    
                             style contains 254 image files                                             
                    WARNING  No caption file found for 254 images. Training will      train_util.py:1459
                             continue without captions for these images. If class                       
                             token exists, it will be used. /                                           
                             254枚の画像にキャプションファイルが見つかりませんでした                    
                             。これらの画像についてはキャプションなしで学習を続行しま                   
                             す。class tokenが存在する場合はそれを使います。                            
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00000-1214595497.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00001-2130368021.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00002-1697800000.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00008-3720950371.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00010-3011843286.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1464
                             style/00011-3011843287.png... and 249 more                                 
                    INFO     178 train images with repeating.                         train_util.py:1508
                    INFO     254 reg images.                                          train_util.py:1511
                    WARNING  some of reg images are not used /                        train_util.py:1513
                             正則化画像の数が多いので、一部使用されない正則化画像があ                   
                             ります                                                                     
                    INFO     [Dataset 0]                                              config_util.py:544
                               batch_size: 2                                                            
                               resolution: (512, 512)                                                   
                               enable_bucket: True                                                      
                               network_multiplier: 1.0                                                  
                               min_bucket_reso: 384                                                     
                               max_bucket_reso: 1024                                                    
                               bucket_reso_steps: 64                                                    
                               bucket_no_upscale: True                                                  
                                                                                                        
                               [Subset 0 of Dataset 0]                                                  
                                 image_dir:                                                             
                             "/run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00                   
                             seArt art style"                                                           
                                 image_count: 89                                                        
                                 num_repeats: 2                                                         
                                 shuffle_caption: False                                                 
                                 keep_tokens: 0                                                         
                                 keep_tokens_separator:                                                 
                                 caption_dropout_rate: 0.0                                              
                                 caption_dropout_every_n_epoches: 0                                     
                                 caption_tag_dropout_rate: 0.0                                          
                                 caption_prefix: None                                                   
                                 caption_suffix: None                                                   
                                 color_aug: False                                                       
                                 flip_aug: False                                                        
                                 face_crop_aug_range: None                                              
                                 random_crop: True                                                      
                                 token_warmup_min: 1,                                                   
                                 token_warmup_step: 0,                                                  
                                 is_reg: False                                                          
                                 class_tokens: M00seArt art style                                       
                                 caption_extension: .caption                                            
                                                                                                        
                               [Subset 1 of Dataset 0]                                                  
                                 image_dir:                                                             
                             "/run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art                   
                             style"                                                                     
                                 image_count: 254                                                       
                                 num_repeats: 1                                                         
                                 shuffle_caption: False                                                 
                                 keep_tokens: 0                                                         
                                 keep_tokens_separator:                                                 
                                 caption_dropout_rate: 0.0                                              
                                 caption_dropout_every_n_epoches: 0                                     
                                 caption_tag_dropout_rate: 0.0                                          
                                 caption_prefix: None                                                   
                                 caption_suffix: None                                                   
                                 color_aug: False                                                       
                                 flip_aug: False                                                        
                                 face_crop_aug_range: None                                              
                                 random_crop: True                                                      
                                 token_warmup_min: 1,                                                   
                                 token_warmup_step: 0,                                                  
                                 is_reg: True                                                           
                                 class_tokens: art style                                                
                                 caption_extension: .caption                                            
                                                                                                        
                                                                                                        
                    INFO     [Dataset 0]                                              config_util.py:550
                    INFO     loading image sizes.                                      train_util.py:794
100%|██████████████████████████████████████████████████████████████| 267/267 [00:00<00:00, 11221.01it/s]
                    INFO     make buckets                                              train_util.py:800
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if        train_util.py:817
                             bucket_no_upscale is set, because bucket reso is defined                   
                             by image size automatically /                                              
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像                  
                             サイズから自動計算されるため、min_bucket_resoとmax_bucket                  
                             _resoは無視されます                                                        
                    INFO     number of images (including repeats) /                    train_util.py:846
                             各bucketの画像枚数（繰り返し回数を含む）                                   
                    INFO     bucket 0: resolution (512, 512), count: 356               train_util.py:851
                    INFO     mean ar error (without repeats): 0.0                      train_util.py:856
                    INFO     prepare accelerator                                         train_db.py:101
accelerator device: cuda
                    INFO     loading model for process 0/1                            train_util.py:4111
                    INFO     load StableDiffusion checkpoint:                         train_util.py:4066
                             /run/media/hattivatti/StableDiffusionS/Programs/stable-d                   
                             iffusion-webui/models/Stable-diffusion/artUniverse_v80.s                   
                             afetensors                                                                 
                    INFO     UNet2DConditionModel: 64, 8, 768, False, False        original_unet.py:1387
2024-03-02 20:01:45 INFO     loading u-net: <All keys matched successfully>           model_util.py:1009
                    INFO     loading vae: <All keys matched successfully>             model_util.py:1017
2024-03-02 20:01:47 INFO     loading text encoder: <All keys matched successfully>    model_util.py:1074
                    INFO     Enable xformers for U-Net                                train_util.py:2529
prepare optimizer, data loader etc.
False

===================================BUG REPORT===================================
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')}
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('local/hattivatti-tietokone'), PosixPath('@/tmp/.ICE-unix/1397,unix/hattivatti-tietokone')}
The following directories listed in your path were found to be non-existent: {PosixPath('-ccbin /opt/cuda/bin')}
The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/app-dbus\\x2d'), PosixPath('1.2-org.gnome.Nautilus@3.service/memory.pressure'), PosixPath('1.2\\x2dorg.gnome.Nautilus.slice/dbus-')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/2f9c98bb_a83b_4358_9a26_13fb64b1264a')}
The following directories listed in your path were found to be non-existent: {PosixPath('//debuginfod.archlinux.org'), PosixPath('https')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Loading binary /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
Traceback (most recent call last):
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/./train_db.py", line 504, in <module>
    train(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/./train_db.py", line 180, in train
    _, _, optimizer = train_util.get_optimizer(args, trainable_params)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/library/train_util.py", line 3616, in get_optimizer
    import bitsandbytes as bnb
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/bin/python', './train_db.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--enable_bucket', '--min_bucket_reso=384', '--max_bucket_reso=1024', '--learning_rate=0.0001', '--learning_rate_te=1e-05', '--logging_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/log', '--lr_scheduler=constant_with_warmup', '--lr_scheduler_num_cycles=20', '--lr_warmup_steps=178', '--max_data_loader_n_workers=0', '--resolution=512,512', '--max_train_steps=3560', '--mixed_precision=fp16', '--optimizer_type=AdamW8bit', '--output_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/model', '--output_name=M00seArt', '--pretrained_model_name_or_path=/run/media/hattivatti/StableDiffusionS/Programs/stable-diffusion-webui/models/Stable-diffusion/artUniverse_v80.safetensors', '--random_crop', '--reg_data_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/reg', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--train_batch_size=2', '--train_data_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/img', '--xformers']' returned non-zero exit status 1.

Hello guys and thank you in advance. Kohya_SS doesn’t have a very active community that I can find to ask, and I can’t say how many users are Linux users anyways. I know it’s a stretch, but this was the best place I could think to ask for help. It’s worth noting that I’m a pretty nubby user, so it’s going to be really easy to talk over my head. I’ll try to give you all the relevant information, though.

Manjaro, Gnome, nvidia 1080ti, i7 4790k

Kohya_SS can be found on github and is used to train Stable Diffusion neural network stuff (to be used with Automatic1111’s version, for example). Kohya_SS requires Python 3.10 basically, so I needed to shim it with pyenv, which I successfully did, and you’ll see the confirmation of it in the images. Kohya needs a lot of other dependencies, though to work correctly. The shim I used was “pyenv local 3.10.11”

I’m no expert, but before I link everything, what I “think” is happening is that while I may have the correct stuff installed, it’s failing to path to it for some reason due to my shims or some other compatibility issues. Kohya does have a setup.sh to install all of this, and I do run it in the images below and other than not being able to configure accelerate, it has no issues (I don’t believe that detail is too relevant. I also don’t believe my card supports it due to age). And, for the record, about six months ago, I did have Kohya running fine on this exact machine, but I did have headaches getting it running then, too. Just different headaches. If you’d like to check out the repository yourself or even try installing it on your own machine, I’ll put the URL below. It seems to install fine, though. It’s only when I try to run images through it to train with, it starts screaming stuff (and the stuff is gibberish to me). I will attach it below, too. Again. Thanks in advance for any help. I will attach images and such that I believe are somehow relevant. I’m thinking the solution might involve me somehow using pip install stuff I already have installed, but in a way it can be found (since I think it’s not pathing correctly). Also, the windows subsystem library issue…I don’t think it’s really an issue. It used to do that in the past, too, and never caused me any issues for starters. I’m only pointing it out because I assume there’s a chance you’ve never used this software before.

I’m sure I have some other stuff installed, too, it’s just slipping my mind. Below is the terminal output with the errors.

Warning: LD_LIBRARY_PATH environment variable is not set.
Certain functionalities may not work correctly.
Please ensure that the required libraries are properly configured.
 
If you use WSL2 you may want to: export LD_LIBRARY_PATH=/usr/lib/wsl/lib/
 
14:41:53-655134 INFO     Version: v22.6.2                                       
                                                                                
14:41:53-659647 INFO     nVidia toolkit detected                                
14:41:54-802917 INFO     Torch 2.0.1+cu118                                                              
14:41:54-828628 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700                                     
14:41:54-851865 INFO     Torch detected GPU: NVIDIA GeForce GTX 1080 Ti VRAM 11169 Arch (6, 1) Cores 28 
14:41:54-853733 INFO     Verifying modules installation status from                                     
                         /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/requirements_linux.txt
                         ...                                                                            
14:41:54-856948 INFO     Verifying modules installation status from requirements.txt...                 
14:41:58-172315 INFO     headless: False                                                                
14:41:58-176504 INFO     Load CSS...                                                                    
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
14:42:52-653517 INFO     Loading config...                                                              
14:42:54-365790 INFO     Loading config...                                                              
14:42:56-702964 INFO     Start training Dreambooth...                                                   
14:42:57-149943 INFO     Valid image folder names found in:                                             
                         /run/media/hattivatti/1TBOverflow/ModelOutput/img                              
14:42:57-151364 INFO     Valid image folder names found in:                                             
                         /run/media/hattivatti/1TBOverflow/ModelOutput/reg                              
14:42:57-297532 INFO     Folder 2_M00seArt art style : steps 178                                        
14:42:57-298694 INFO     Regularisation images are used... Will double the number of steps required...  
14:42:57-299618 INFO     max_train_steps (178 / 2 / 1 * 20 * 2) = 3560                                  
14:42:57-300604 INFO     stop_text_encoder_training = 0                                                 
14:42:57-301416 INFO     lr_warmup_steps = 178                                                          
14:42:57-303139 INFO     Saving training config to                                                      
                         /run/media/hattivatti/1TBOverflow/ModelOutput/model/M00seArt_20240302-144257.js
                         on...                                                                          
14:42:57-326375 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_db.py"              
                         --bucket_no_upscale --bucket_reso_steps=64 --enable_bucket                     
                         --min_bucket_reso=384 --max_bucket_reso=1024 --learning_rate="0.0001"          
                         --learning_rate_te="1e-05"                                                     
                         --logging_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/log"              
                         --lr_scheduler="constant_with_warmup" --lr_scheduler_num_cycles="20"           
                         --lr_warmup_steps="178" --max_data_loader_n_workers="0" --resolution="512,512" 
                         --max_train_steps="3560" --mixed_precision="fp16" --optimizer_type="AdamW8bit" 
                         --output_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/model"             
                         --output_name="M00seArt"                                                       
                         --pretrained_model_name_or_path="/run/media/hattivatti/StableDiffusionS/Program
                         s/stable-diffusion-webui/models/Stable-diffusion/artUniverse_v80.safetensors"  
                         --random_crop                                                                  
                         --reg_data_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/reg"             
                         --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="fp16"  
                         --train_batch_size="2"                                                         
                         --train_data_dir="/run/media/hattivatti/1TBOverflow/ModelOutput/img" --xformers
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2024-03-02 14:43:01.723671: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-02 14:43:01.723703: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-02 14:43:01.723731: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-02 14:43:01.729252: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-02 14:43:02.555114: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-02 14:43:03 INFO     prepare tokenizer                                        train_util.py:3959
                    INFO     prepare images.                                          train_util.py:1469
                    INFO     found directory                                          train_util.py:1432
                             /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s                   
                             eArt art style contains 89 image files                                     
                    WARNING  No caption file found for 89 images. Training will       train_util.py:1459
                             continue without captions for these images. If class                       
                             token exists, it will be used. /                                           
                             89枚の画像にキャプションファイルが見つかりませんでした。                   
                             これらの画像についてはキャプションなしで学習を続行します                   
                             。class tokenが存在する場合はそれを使います。                              
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/1.png                                                       
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/10.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/11.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/12.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1466
                             eArt art style/13.png                                                      
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00s train_util.py:1464
                             eArt art style/14.png... and 84 more                                       
                    INFO     found directory                                          train_util.py:1432
                             /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art                    
                             style contains 254 image files                                             
                    WARNING  No caption file found for 254 images. Training will      train_util.py:1459
                             continue without captions for these images. If class                       
                             token exists, it will be used. /                                           
                             254枚の画像にキャプションファイルが見つかりませんでした                    
                             。これらの画像についてはキャプションなしで学習を続行しま                   
                             す。class tokenが存在する場合はそれを使います。                            
2024-03-02 14:43:04 WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00000-1214595497.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00001-2130368021.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00002-1697800000.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00008-3720950371.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1466
                             style/00010-3011843286.png                                                 
                    WARNING  /run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art  train_util.py:1464
                             style/00011-3011843287.png... and 249 more                                 
                    INFO     178 train images with repeating.                         train_util.py:1508
                    INFO     254 reg images.                                          train_util.py:1511
                    WARNING  some of reg images are not used /                        train_util.py:1513
                             正則化画像の数が多いので、一部使用されない正則化画像があ                   
                             ります                                                                     
                    INFO     [Dataset 0]                                              config_util.py:544
                               batch_size: 2                                                            
                               resolution: (512, 512)                                                   
                               enable_bucket: True                                                      
                               network_multiplier: 1.0                                                  
                               min_bucket_reso: 384                                                     
                               max_bucket_reso: 1024                                                    
                               bucket_reso_steps: 64                                                    
                               bucket_no_upscale: True                                                  
                                                                                                        
                               [Subset 0 of Dataset 0]                                                  
                                 image_dir:                                                             
                             "/run/media/hattivatti/1TBOverflow/ModelOutput/img/2_M00                   
                             seArt art style"                                                           
                                 image_count: 89                                                        
                                 num_repeats: 2                                                         
                                 shuffle_caption: False                                                 
                                 keep_tokens: 0                                                         
                                 keep_tokens_separator:                                                 
                                 caption_dropout_rate: 0.0                                              
                                 caption_dropout_every_n_epoches: 0                                     
                                 caption_tag_dropout_rate: 0.0                                          
                                 caption_prefix: None                                                   
                                 caption_suffix: None                                                   
                                 color_aug: False                                                       
                                 flip_aug: False                                                        
                                 face_crop_aug_range: None                                              
                                 random_crop: True                                                      
                                 token_warmup_min: 1,                                                   
                                 token_warmup_step: 0,                                                  
                                 is_reg: False                                                          
                                 class_tokens: M00seArt art style                                       
                                 caption_extension: .caption                                            
                                                                                                        
                               [Subset 1 of Dataset 0]                                                  
                                 image_dir:                                                             
                             "/run/media/hattivatti/1TBOverflow/ModelOutput/reg/1_art                   
                             style"                                                                     
                                 image_count: 254                                                       
                                 num_repeats: 1                                                         
                                 shuffle_caption: False                                                 
                                 keep_tokens: 0                                                         
                                 keep_tokens_separator:                                                 
                                 caption_dropout_rate: 0.0                                              
                                 caption_dropout_every_n_epoches: 0                                     
                                 caption_tag_dropout_rate: 0.0                                          
                                 caption_prefix: None                                                   
                                 caption_suffix: None                                                   
                                 color_aug: False                                                       
                                 flip_aug: False                                                        
                                 face_crop_aug_range: None                                              
                                 random_crop: True                                                      
                                 token_warmup_min: 1,                                                   
                                 token_warmup_step: 0,                                                  
                                 is_reg: True                                                           
                                 class_tokens: art style                                                
                                 caption_extension: .caption                                            
                                                                                                        
                                                                                                        
                    INFO     [Dataset 0]                                              config_util.py:550
                    INFO     loading image sizes.                                      train_util.py:794
100%|████████████████████████████████████████████████████████████████| 267/267 [00:01<00:00, 201.08it/s]
2024-03-02 14:43:05 INFO     make buckets                                              train_util.py:800
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if        train_util.py:817
                             bucket_no_upscale is set, because bucket reso is defined                   
                             by image size automatically /                                              
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像                  
                             サイズから自動計算されるため、min_bucket_resoとmax_bucket                  
                             _resoは無視されます                                                        
                    INFO     number of images (including repeats) /                    train_util.py:846
                             各bucketの画像枚数（繰り返し回数を含む）                                   
                    INFO     bucket 0: resolution (512, 512), count: 356               train_util.py:851
                    INFO     mean ar error (without repeats): 0.0                      train_util.py:856
                    INFO     prepare accelerator                                         train_db.py:101
accelerator device: cuda
                    INFO     loading model for process 0/1                            train_util.py:4111
                    INFO     load StableDiffusion checkpoint:                         train_util.py:4066
                             /run/media/hattivatti/StableDiffusionS/Programs/stable-d                   
                             iffusion-webui/models/Stable-diffusion/artUniverse_v80.s                   
                             afetensors                                                                 
                    INFO     UNet2DConditionModel: 64, 8, 768, False, False        original_unet.py:1387
2024-03-02 14:43:13 INFO     loading u-net: <All keys matched successfully>                                                           model_util.py:1009
                    INFO     loading vae: <All keys matched successfully>                                                             model_util.py:1017
2024-03-02 14:43:15 INFO     loading text encoder: <All keys matched successfully>                                                    model_util.py:1074
                    INFO     Enable xformers for U-Net                                                                                train_util.py:2529
prepare optimizer, data loader etc.
False

===================================BUG REPORT===================================
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')}
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1406,unix/hattivatti-tietokone'), PosixPath('local/hattivatti-tietokone')}
The following directories listed in your path were found to be non-existent: {PosixPath('-ccbin /opt/cuda/bin')}
The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/app-dbus\\x2d'), PosixPath('1.2\\x2dorg.gnome.Nautilus.slice/dbus-'), PosixPath('1.2-org.gnome.Nautilus@0.service/memory.pressure')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/5405e7e9_4024_4569_b4e1_387ee2380c8d')}
The following directories listed in your path were found to be non-existent: {PosixPath('//debuginfod.archlinux.org'), PosixPath('https')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Loading binary /run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
Traceback (most recent call last):
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/./train_db.py", line 504, in <module>
    train(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/./train_db.py", line 180, in train
    _, _, optimizer = train_util.get_optimizer(args, trainable_params)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/library/train_util.py", line 3616, in get_optimizer
    import bitsandbytes as bnb
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/run/media/hattivatti/StableDiffusionS/Programs/kohya_ss/venv/bin/python', './train_db.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--enable_bucket', '--min_bucket_reso=384', '--max_bucket_reso=1024', '--learning_rate=0.0001', '--learning_rate_te=1e-05', '--logging_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/log', '--lr_scheduler=constant_with_warmup', '--lr_scheduler_num_cycles=20', '--lr_warmup_steps=178', '--max_data_loader_n_workers=0', '--resolution=512,512', '--max_train_steps=3560', '--mixed_precision=fp16', '--optimizer_type=AdamW8bit', '--output_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/model', '--output_name=M00seArt', '--pretrained_model_name_or_path=/run/media/hattivatti/StableDiffusionS/Programs/stable-diffusion-webui/models/Stable-diffusion/artUniverse_v80.safetensors', '--random_crop', '--reg_data_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/reg', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--train_batch_size=2', '--train_data_dir=/run/media/hattivatti/1TBOverflow/ModelOutput/img', '--xformers']' returned non-zero exit status 1.

Captions are also not required, if that bit is catching your eye. I see it warning about it, but I’m quite confident something else here is causing the issue.

anon51566685 · 2 March 2024 13:40

Please remove useless images of text and post actual text. Also edit all of your output as preformatted text because it’s unreadable now.

MrDaisyBates · 2 March 2024 13:42

Sorry about that. I’ll see if I can figure out how to do that. The original images, I don’t have the text for.

MrDaisyBates · 3 March 2024 12:40

I have managed to find the issue, but I don’t know how to solve it. I’ve edited it at the top of the original post.

Basically to install Kohya, it requires you to run Python 3.10, and to accomplish this, I had to run “pyenv local 3.10.11” in the terminal inside the folder.

When I run the setup.sh script, if I run it in the terminal (inside the folder), it detects the python 3.10.11, but it fails to find my cuda library. When I run the setup.sh via right click → run as program, it fails to detect my shim. But I noticed by running a different terminal (outside the folder) that bitsandbytes does find my cuda stuff just fine when I’m outside of the folder.

Therefore, the issue seems to revolve around the pyenv shim in my folder and the cuda libs somehow becoming invisible to my setup.sh file because of it.

I just don’t know what a solution looks like.

Edit: My kohya is on a different drive than my operating system. I’ve had it running successfully on this very drive before, so I don’t know why that would be an issue. Just wanted to put it out there.

MrDaisyBates · 3 March 2024 16:33

I am going to close this. I’m not quite sure what the solution is, but I followed these steps to get it working (in case someone else finds this and needs help)

I deleted/uninstalled most of my cuda stuff I had installed
I had to go to the internet archive website to find gcc11 and the libs for it, download the packages for them
I also had to get cuda tools 11.8 from the arch archive as well since they’re no longer available
I installed the packages (using pacman, menu top right “install local packages”) and I installed the gcc11 libs first, then gcc11, then cuda tools 11.8 (which needs gcc11 to run apparently).
I deleted my venv folder for Kohya
Open terminal, confirmed my 3.10.11 ship was working (research pyenv if you need to)
In same terminal, ran ./setup.sh
Ran “source ./venv/bin/activate”
Ran “python -m bitsandbytes” to check my cuda libs, and they cleaned up a lot.
Ran ./gui.sh

As I type this message, I seem to be successfully training a LoRA with no issues.

I had a lot of cuda stuff installed and idk if it was having a hard time pathing itself or what. I also heard that cuda tools 11.8 might be preferred by Kohya, so I went through some additional steps to uninstall 12.3 and downgrade to 11.8 via the steps listed above. I hope it helps someone.

system · 5 March 2024 04:34

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.