Remove Training_PRO extension

The built-in training tab now covers its essential functionality
with a more modern and correct implementation (apply_chat_template,
dynamic padding, JSONL datasets, stride overlap).
This commit is contained in:
oobabooga 2026-03-05 12:55:07 -03:00
parent 1ffe540c97
commit 5be68cc073
6 changed files with 0 additions and 2249 deletions

View file

@ -21,7 +21,6 @@ If you create an extension, you are welcome to host it in a GitHub repository an
|Extension|Description|
|---------|-----------|
|[openai](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)| Creates an API that mimics the OpenAI API and can be used as a drop-in replacement. |
|[Training_PRO](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/Training_PRO)| Advanced LoRA training with support for model and LoRA merging. |
|[superboogav2](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/superboogav2)| Enhanced RAG extension with support for PDF, DOCX, and PPTX files. |
|[send_pictures](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/send_pictures/)| Creates an image upload field that can be used to send images to the bot in chat mode. Captions are automatically generated using BLIP. |
|[coqui_tts](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/coqui_tts)| Text-to-speech extension using Coqui XTTS v2. |

View file

@ -1,92 +0,0 @@
# Training_PRO
This is an expanded and reworked Training tab
Maintained by FP
[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/Q5Q5MOB4M)
Repo home:
https://github.com/FartyPants/Training_PRO
In general the repo above is ahead of the extension included in text WebUi.
## News
- NEFtune: add noise to help with generalization
- Loss Graph in interface.
- Supports Mistral training
- some roundabout around pytorch and transformers version desync
![image](https://github.com/FartyPants/Training_PRO/assets/23346289/e389ec69-d7ad-4922-9ad9-865625997479)
## Features/Changes
- Chunking: precise raw text slicer (PRTS) uses sentence slicing and making sure things are clean on all ends
- overlap chunking - this special overlapping will make additional overlap block based on logical rules (aka no overlap block on hard cut)
- custom scheduler (follow the code to make your own) In LR Scheduler select FP_low_epoch_annealing - this scheduler will keep the LR constant for first epoch then use cosine for the rest - this part would be best to spawn into a new py file
- saves graph png file at the end with learning rate and loss per epoch
- adding EOS to each block or to hard cut only
- automatically lowers gradient accumulation if you go overboard and set gradient accumulation that will be higher than actual data - transformers would then throw error (or they used to, not sure if still true) but in any way, it will fix bad data
- turn BOS on and OFF
- target selector
- DEMENTOR LEARNING (experimental) Deep Memorization Enforcement Through Overlapping and Repetition. This is an experiment for long-text learning using low epochs (basically use 1 epoch with constant LR or 2 epochs with FP_low_epoch_annealing LR scheduler)
- Getting rid of micro batch size/batch size confusion. Now there is True Batch Size and Gradient accumulation slider, consisten with all the other training out there
- Ability to save Checkpoint during training with a button
- Ability to change Stop Loss during training
- different modes of checkpoint auto saving
- Function to Check Dataset and suggest parameters such as warmup and checkpoint save frequency before training
- Graph Training Loss in interface
- more custom schedulers
### Notes:
This uses it's own chunking code for raw text based on sentence splitting. This will avoid weird cuts in the chunks and each chunk should now start with sentence and end on some sentence. It works hand in hand with Hard Cut. A propper use is to structure your text into logical blocks (ideas) separated by three \n then use three \n in hard cut. This way each chunk will contain only one flow of ideas and not derail in the thoughts. And Overlapping code will create overlapped blocks on sentence basis too, but not cross hard cut, thus not cross different ideas either. Does it make any sense? No? Hmmmm...
### Custom schedulers
A bunch of custom (combination) schedulers are added to the LR schedule. These are based on my own experiments
**FP_low_epoch_annealing**
Uses constant LR (with warmup) for 1 epoch only. The rest of the epoch(s) is cosine annealing. So 10 epochs - 1 will be constant 9 will be nose dive down. However a typical usage would be 2 epochs (hence low epoch in name). 1st is constant, the second is annealing. Simple. I use it 90% of time.
**FP_half_time_annealing**
Like the low epoch, but now the total number of steps is divided by 2. First half is constant, second half is annealing. So 10 epochs - 5 will be constant, 5 will be cosine nose down.
**FP_raise_fall_creative**
This is a sine raise till half of the total steps then cosine fall the rest. (Or you may think of the curve as sine in its entirety. The most learning is done in the hump, in the middle. The warmup entry has no effect, since sine is automatically warm up.
The idea is to start very mildly as not to overfit with the first blocks of dataset. It seems to broaden the scope of the model making it less strict for tight dataset.
### Targets
Normal LORA is q, v and that's what you should use. You can use (q k v o) or (q k v) and it will give you a lot more trainable parameters. The benefit is that you can keep rank lower and still attain the same coherency as q v with high rank. Guanaco has been trained with QLORA and q k v o for example and they swear by it.
### DEMENTOR LEARNING (experimental) Deep Memorization Enforcement Through Overlapping and Repetition
This is and experimental chunking to train long-form text in low number of epochs (basically 1) with sliding repetition. The depth of learning directly depends on the cutoff_length. Increasing cutoff length will also increase number of blocks created from long-form text (which is contrary to normal training). It is based on my own wild experiments.
### Getting rid of batch size and micro batch size
Keeping consistency with everyone else.
Listen, There is only ONE batch size - the True batch size (called previously micro-batch size in WebUI) - this is how many blocks are processed at once (during a single step). It eats GPU, but it really helps with the quality training (in fact the ideal batch size would be the same as number of blocks - which is unrealistic) - so the idea is to cram as much True Batch Size before your GPU blows with OOM. On 24GB this is about 10 for 13b (loaded with 4-bit)
So no micro batch size - it is now called True Batch Size, because that's what it is.
The other thing is Gradient Accumulation - this is an emulation of the above Batch size - a virtual batch size, if you will. If your GPU can't handle real batch size then you may fake it using Gradient Accumulation. This will accumulate the gradients over so many steps defined here and then update the weights at the end without increase in GPU.
Gradient accumulation is like a virtual Batch size multiplier without the GPU penalty.
If your batch size is 4 and your gradient accumulation is 2 then it sort of behaves as if we have batch size 8. *Sort of* because Batch size of 4 and GA of 2 is NOT the same as batch size of 2 and GA of 4. (It produces different weights - hence it's not an equivalent). The idea is that if you don't have GPU - using GA to extend batch size is the next best thing (good enough) since you have no other choice.
If all you can afford is 1 batch size, then increasing GA will likely make the learning better in some range of GA (it's not always more is better).
However - GA is not some golden goose. As said, it isn't the same as batch size. In fact GA may worsen your learning as well.
I would suggest a series of experiment where you would put batch size as high as possible without OOM, set GA 1, then repeat training while increasing the GA (2, 4...), and see how the model changes. It's likely that it would follow some sort of curve where GA will seem to help before it will make it worse. Some people believe that if you can squeeze 6 BATCH Size, then you should not bother with GA at all... YMMW
High Batch Size vs High GA would also likely produce different results in terms of learning words vs style. How? Hmmmm... good question.
One optical "benefit" of GA is that the loss will fluctuate less (because of all the gradient accumulation, which works as a form of noise smoothing as well).

View file

@ -1,433 +0,0 @@
from functools import partial
import torch
import transformers
import math
from torch.optim.lr_scheduler import LambdaLR
from peft import (
PeftModel,
)
RED = "\033[91m"
YELLOW = "\033[93m"
GREEN = "\033[92m"
RESET = "\033[0m"
last_print_label = ''
custom_scheduler_params = {'trigger_loss': 0.0, 'ramp_down_ratio':1.0, 'current_loss': 0.0,'dynamic_scheduler_stop': False, 'calc_ramp_down_at_step': 0, 'calc_num_training_steps': 0}
def custom_scheduler_global_update(current_loss: float):
custom_scheduler_params.update({'current_loss': current_loss})
def custom_scheduler_global_setup(trigger_loss: float, ramp_down_ratio: float):
custom_scheduler_params.update({'trigger_loss': trigger_loss})
custom_scheduler_params.update({'ramp_down_ratio': ramp_down_ratio})
# calculates the total num steps after trigger
custom_scheduler_params.update({'calc_num_training_steps': 0})
#calculates steps when the ramp_down trigger occurred
custom_scheduler_params.update({'calc_ramp_down_at_step': 0})
# triggers scheduler stopping after it reached calc_num_training_steps
custom_scheduler_params.update({'dynamic_scheduler_stop': False})
# hold constant to the half of epochs then cosine down to 0
def _get_fp_half_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
global last_print_label
print_label = ''
half_steps = num_training_steps//2
num_warmup_steps = min(num_warmup_steps,half_steps)
if current_step < num_warmup_steps:
print_label = 'Scheduler: Warmup'
elif current_step < half_steps:
print_label = 'Scheduler: Hold'
else:
print_label = 'Scheduler: Annealing'
if print_label != last_print_label:
print(print_label)
last_print_label = print_label
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
if current_step < half_steps:
return 1.0
progress = float(current_step - half_steps) / float(max(1, num_training_steps - half_steps))
num_cycles = 0.5
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
# raise up in cosine, then fall back in cosine
def _get_fp_cosine_raise_and_fall_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
global last_print_label
print_label = ''
half_steps = num_training_steps//2
#num_warmup_steps = min(num_warmup_steps,half_steps)
if current_step < half_steps:
print_label = 'Scheduler: Raise'
else:
print_label = 'Scheduler: Fall'
if print_label != last_print_label:
print(print_label)
last_print_label = print_label
# linear
# return float(current_step) / float(max(1, num_warmup_steps))
progress = float(current_step - half_steps) / float(max(1, num_training_steps - half_steps))
num_cycles = 0.5
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
# constant to the first epochs then cosine down to 0 over the rest epochs
def _get_fp_cosine_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
global last_print_label
print_label = ''
num_warmup_steps = min(num_warmup_steps,num_firstepoch_steps)
if current_step < num_warmup_steps:
print_label = 'Scheduler: Warmup'
elif current_step < num_firstepoch_steps:
print_label = 'Scheduler: Hold'
else:
print_label = 'Scheduler: Annealing'
if print_label != last_print_label:
print(print_label)
last_print_label = print_label
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
if current_step < num_firstepoch_steps:
return 1.0
progress = float(current_step - num_firstepoch_steps) / float(max(1, num_training_steps - num_firstepoch_steps))
num_cycles = 0.5
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
# halve lr each epoch
def _get_fp_cdrop_rate_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
global last_print_label
print_label = ''
num_warmup_steps = min(num_warmup_steps, num_firstepoch_steps)
current_epoch = (current_step // num_firstepoch_steps) + 1
if current_step < num_warmup_steps:
print_label = 'Scheduler: Warmup'
elif current_step < num_firstepoch_steps:
print_label = 'Scheduler: Hold'
else:
print_label = 'Scheduler: Drop Rate'
if print_label != last_print_label:
print(print_label)
last_print_label = print_label
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
if current_step < num_firstepoch_steps:
return 1.0
# Compute the learning rate for the annealing phase
learning_rate = 1.0 / float(2 ** (current_epoch - 1))
return learning_rate
# epoch decay: 1/(1 + decay * epoch)
def custom_cosine_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
"""
Args:
optimizer ([`~torch.optim.Optimizer`]):
The optimizer for which to schedule the learning rate.
num_warmup_steps (`int`):
The number of steps for the warmup phase.
num_training_steps (`int`):
The total number of training steps.
last_epoch (`int`, *optional*, defaults to -1):
The index of the last epoch when resuming training.
Return:
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
"""
lr_lambda = partial(
_get_fp_cosine_schedule_with_warmup_lr_lambda,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
return LambdaLR(optimizer, lr_lambda, last_epoch)
def custom_half_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
"""
Args:
optimizer ([`~torch.optim.Optimizer`]):
The optimizer for which to schedule the learning rate.
num_warmup_steps (`int`):
The number of steps for the warmup phase.
num_training_steps (`int`):
The total number of training steps.
last_epoch (`int`, *optional*, defaults to -1):
The index of the last epoch when resuming training.
Return:
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
"""
lr_lambda = partial(
_get_fp_half_schedule_with_warmup_lr_lambda,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
return LambdaLR(optimizer, lr_lambda, last_epoch)
def custom_raise_fall_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
"""
Args:
optimizer ([`~torch.optim.Optimizer`]):
The optimizer for which to schedule the learning rate.
num_warmup_steps (`int`):
The number of steps for the warmup phase.
num_training_steps (`int`):
The total number of training steps.
last_epoch (`int`, *optional*, defaults to -1):
The index of the last epoch when resuming training.
Return:
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
"""
lr_lambda = partial(
_get_fp_cosine_raise_and_fall_lr_lambda,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
return LambdaLR(optimizer, lr_lambda, last_epoch)
def neftune_forward(self, input: torch.Tensor):
"""
Implements the NEFTune forward pass for the model. Note this works only for
torch.nn.Embedding layers. This method is slightly adapted from the original source code
that can be found here: https://github.com/neelsjain/NEFTune
Args:
input (`torch.Tensor`):
The input tensor to the model.
noise_alpha (`float`):
The noise alpha value to use for the NEFTune forward pass.
"""
embeddings = torch.nn.functional.embedding(
input, self.weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse
)
if self.training:
# Add noise to the embeddings
dims = torch.tensor(embeddings.size(1) * embeddings.size(2))
mag_norm = self.neftune_noise_alpha / torch.sqrt(dims)
embeddings = embeddings + torch.zeros_like(embeddings).uniform_(-mag_norm, mag_norm)
return embeddings
class FPNEFtuneTrainer(transformers.Trainer):
def __init__(self,neftune_noise_alpha:float = 0.0, model = None, *args, **kwargs):
self.neftune_noise_alpha = neftune_noise_alpha
if self.neftune_noise_alpha > 0.0:
model = self._activate_neftune(model)
super().__init__(model = model, *args, **kwargs)
def _activate_neftune(self, model):
r"""
Activates the neftune as presented in this code: https://github.com/neelsjain/NEFTune and paper: https://arxiv.org/abs/2310.05914
"""
print(f"Activating {RED}NEFtune{RESET} with scale: {self.neftune_noise_alpha}")
if isinstance(model, transformers.PreTrainedModel):
embeddings = model.get_input_embeddings()
elif isinstance(model, PeftModel):
embeddings = model.base_model.get_input_embeddings()
embeddings.neftune_noise_alpha = self.neftune_noise_alpha
old_forward = embeddings.forward
# This hack seems to be needed to properly use a custom forward pass
# all credits to: https://discuss.pytorch.org/t/how-can-i-replace-the-forward-method-of-a-predefined-torchvision-model-with-my-customized-forward-function/54224/11
bound_method = neftune_forward.__get__(embeddings, embeddings.__class__)
setattr(embeddings, "forward", bound_method)
# embeddings.forward = neftune_forward
embeddings._trl_old_forward = old_forward
return model
def train(self, *args, **kwargs):
output = super().train(*args, **kwargs)
# After training we make sure to retrieve back the original forward pass method
# for the embedding layer
if self.neftune_noise_alpha is not None:
if isinstance(self.model, transformers.PreTrainedModel):
embeddings = self.model.get_input_embeddings()
elif isinstance(self.model, PeftModel):
embeddings = self.model.base_model.get_input_embeddings()
if hasattr(embeddings, "_trl_old_forward"):
embeddings.forward = embeddings._trl_old_forward
del embeddings._trl_old_forward
del embeddings.neftune_noise_alpha
return output
class FPSchedulerTrainer(transformers.Trainer):
def __init__(self,neftune_noise_alpha:float = 0.0, model = None, *args, **kwargs):
self.neftune_noise_alpha = neftune_noise_alpha
if self.neftune_noise_alpha > 0.0:
model = self._activate_neftune(model)
super().__init__(model = model, *args, **kwargs)
def _activate_neftune(self, model):
r"""
Activates the neftune as presented in this code: https://github.com/neelsjain/NEFTune and paper: https://arxiv.org/abs/2310.05914
"""
print(f"Activating {RED}NEFtune{RESET} with scale: {self.neftune_noise_alpha}")
if isinstance(model, transformers.PreTrainedModel):
embeddings = model.get_input_embeddings()
elif isinstance(model, PeftModel):
embeddings = model.base_model.get_input_embeddings()
embeddings.neftune_noise_alpha = self.neftune_noise_alpha
old_forward = embeddings.forward
# This hack seems to be needed to properly use a custom forward pass
# all credits to: https://discuss.pytorch.org/t/how-can-i-replace-the-forward-method-of-a-predefined-torchvision-model-with-my-customized-forward-function/54224/11
bound_method = neftune_forward.__get__(embeddings, embeddings.__class__)
setattr(embeddings, "forward", bound_method)
# embeddings.forward = neftune_forward
embeddings._trl_old_forward = old_forward
return model
def train(self, *args, **kwargs):
output = super().train(*args, **kwargs)
# After training we make sure to retrieve back the original forward pass method
# for the embedding layer
if self.neftune_noise_alpha is not None:
if isinstance(self.model, transformers.PreTrainedModel):
embeddings = self.model.get_input_embeddings()
elif isinstance(self.model, PeftModel):
embeddings = self.model.base_model.get_input_embeddings()
if hasattr(embeddings, "_trl_old_forward"):
embeddings.forward = embeddings._trl_old_forward
del embeddings._trl_old_forward
del embeddings.neftune_noise_alpha
return output
def create_scheduler(self, num_training_steps: int, optimizer: torch.optim.Optimizer = None):
#Setup the scheduler. The optimizer of the trainer must have been set up either before this method is called or passed as an argument.
num_train_epochs = self.args.num_train_epochs
num_warmup_steps=self.args.get_warmup_steps(num_training_steps)
num_firstepoch_steps = math.ceil(num_training_steps/num_train_epochs)
num_warmup_acc = num_warmup_steps*self.args.gradient_accumulation_steps
num_firstepoch_steps_acc = num_firstepoch_steps*self.args.gradient_accumulation_steps
num_training_steps_acc = num_training_steps*self.args.gradient_accumulation_steps
custom_scheduler_params.update({'dynamic_scheduler_stop': False})
print (f"Warm-up steps aligned to Gradient accumulation ({self.args.gradient_accumulation_steps}) = {num_warmup_acc} actual warmup steps")
if self.args.lr_scheduler_type == 'cosine':
num_warmup_acc_min = min(num_warmup_acc, num_firstepoch_steps_acc)
if num_warmup_acc>num_firstepoch_steps_acc:
print(f"\033[1;31;1mWARNING: The number of warmup steps is set too high! It will be clamped to 1 epoch, essentially going from warmup to annealing.\033[0;37;0m")
print (f"FP Scheduler Warmup: 0-[{num_warmup_acc_min}], Hold [{num_warmup_acc_min}]-{num_firstepoch_steps_acc}, Annealing {num_firstepoch_steps_acc}-{num_training_steps_acc}")
else:
print (f"FP Scheduler Warmup: 0-{num_warmup_acc_min}, Hold {num_warmup_acc_min}-{num_firstepoch_steps_acc}, Annealing {num_firstepoch_steps_acc}-{num_training_steps_acc}")
self.lr_scheduler = custom_cosine_scheduler_with_warmup(
optimizer=self.optimizer if optimizer is None else optimizer,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
self._created_lr_scheduler = True
return self.lr_scheduler
elif self.args.lr_scheduler_type == 'constant':
half_step_acc = num_training_steps_acc//2
num_warmup_acc_min = min(num_warmup_acc, half_step_acc)
if num_warmup_acc>half_step_acc:
print(f"\033[1;31;1mWARNING: The number of warmup steps is set too high! It will be clamped to half of all epochs, essentially going from warmup to annealing in the middle.\033[0;37;0m")
print (f"FP Scheduler Warmup: 0-[{num_warmup_acc_min}], Hold [{num_warmup_acc_min}]-{half_step_acc}, Annealing {half_step_acc}-{num_training_steps_acc}")
else:
print (f"FP Scheduler Warmup: 0-{num_warmup_acc_min}, Hold {num_warmup_acc_min}-{half_step_acc}, Annealing {half_step_acc}-{num_training_steps_acc}")
self.lr_scheduler = custom_half_scheduler_with_warmup(
optimizer=self.optimizer if optimizer is None else optimizer,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
self._created_lr_scheduler = True
return self.lr_scheduler
elif self.args.lr_scheduler_type == 'constant_with_warmup':
half_step_acc = num_training_steps_acc//2
if num_warmup_steps>0:
print(f"Warmup doesn't apply to this scheduler [Raise-Fall]")
print (f"Scheduler Raise: 0-{half_step_acc}, Fall {half_step_acc}-{num_training_steps_acc}")
self.lr_scheduler = custom_raise_fall_scheduler_with_warmup(
optimizer=self.optimizer if optimizer is None else optimizer,
num_warmup_steps=num_warmup_steps,
num_training_steps=num_training_steps,
num_firstepoch_steps = num_firstepoch_steps,
)
self._created_lr_scheduler = True
return self.lr_scheduler
else:
return super().create_scheduler(num_training_steps=num_training_steps, optimizer=optimizer)

View file

@ -1,62 +0,0 @@
import os
import json
def create_graph(lora_path, lora_name):
try:
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
peft_model_path = f'{lora_path}/training_graph.json'
image_model_path = f'{lora_path}/training_graph.png'
# Check if the JSON file exists
if os.path.exists(peft_model_path):
# Load data from JSON file
with open(peft_model_path, 'r') as file:
data = json.load(file)
# Extract x, y1, and y2 values
x = [item['epoch'] for item in data]
y1 = [item['learning_rate'] for item in data]
y2 = [item['loss'] for item in data]
# Create the line chart
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plot y1 (learning rate) on the first y-axis
ax1.plot(x, y1, 'b-', label='Learning Rate')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Learning Rate', color='b')
ax1.tick_params('y', colors='b')
# Create a second y-axis
ax2 = ax1.twinx()
# Plot y2 (loss) on the second y-axis
ax2.plot(x, y2, 'r-', label='Loss')
ax2.set_ylabel('Loss', color='r')
ax2.tick_params('y', colors='r')
# Set the y-axis formatter to display numbers in scientific notation
ax1.yaxis.set_major_formatter(ScalarFormatter(useMathText=True))
ax1.ticklabel_format(style='sci', axis='y', scilimits=(0,0))
# Add grid
ax1.grid(True)
# Combine the legends for both plots
lines, labels = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, loc='best')
# Set the title
plt.title(f'{lora_name} LR and Loss vs Epoch')
# Save the chart as an image
plt.savefig(image_model_path)
print(f"Graph saved in {image_model_path}")
else:
print(f"File 'training_graph.json' does not exist in the {lora_path}")
except ImportError:
print("matplotlib is not installed. Please install matplotlib to create PNG graphs")

File diff suppressed because it is too large Load diff

View file

@ -1,368 +0,0 @@
import os
from modules import shared, utils
from pathlib import Path
import requests
import tqdm
import json
'''
def get_gpu_memory_usage(rank):
return {
'total': round(torch.cuda.get_device_properties(rank).total_memory / (1024**3), 2),
'max': round(torch.cuda.max_memory_allocated(rank) / (1024**3), 2),
'reserved': round(torch.cuda.memory_reserved(rank) / (1024**3), 2),
'allocated': round(torch.cuda.memory_allocated(rank) / (1024**3), 2)
}
'''
def list_subfoldersByTime(directory):
if not directory.endswith('/'):
directory += '/'
subfolders = []
subfolders.append('None')
path = directory
name_list = os.listdir(path)
full_list = [os.path.join(path,i) for i in name_list]
time_sorted_list = sorted(full_list, key=os.path.getmtime,reverse=True)
for entry in time_sorted_list:
if os.path.isdir(entry):
entry_str = f"{entry}" # Convert entry to a string
full_path = entry_str
entry_str = entry_str.replace('\\','/')
entry_str = entry_str.replace(f"{directory}", "") # Remove directory part
subfolders.append(entry_str)
return subfolders
def get_available_loras_local(_sortedByTime):
model_dir = shared.args.lora_dir # Update with the appropriate directory path
subfolders = []
if _sortedByTime:
subfolders = list_subfoldersByTime(model_dir)
else:
subfolders = utils.get_available_loras()
return subfolders
# FPHAM SPLIT BY SENTENCE BLOCK ===============
def split_sentences(text: str, cutoff_len: int):
sentences = []
sentence = ''
delimiters = ['. ', '? ', '! ', '... ', '.\n', '?\n', '!\n','...\n','</s>','<//>']
abbreviations = ['Mr. ', 'Mrs. ', 'Dr. ', 'Ms. ', 'St. ', 'Prof. ', 'Jr. ', 'Ltd. ', 'Capt. ', 'Col. ', 'Gen. ', 'Ave. ', 'Blvd. ', 'Co. ', 'Corp. ', 'Dept. ', 'Est. ', 'Gov. ', 'Inc. ', 'Ph.D. ', 'Univ. ']
errors = 0
max_cut = cutoff_len-1
prev_char = ''
for char in text:
sentence += char
if (any(sentence.endswith(delimiter) for delimiter in delimiters) and
not (prev_char.isupper() and len(sentence) >= 3 and sentence[-3] != ' ') and
not any(sentence.endswith(abbreviation) for abbreviation in abbreviations)):
tokens = shared.tokenizer.encode(sentence)
if len(tokens) > max_cut:
tokens = tokens[:max_cut]
sentence = shared.tokenizer.decode(tokens, skip_special_tokens=True)
errors = errors + 1
sentences.append({'text': sentence, 'size': len(tokens)})
sentence = ''
prev_char = char
if sentence:
tokens = shared.tokenizer.encode(sentence)
if len(tokens) > max_cut:
tokens = tokens[:max_cut]
sentence = shared.tokenizer.decode(tokens, skip_special_tokens=True)
errors = errors + 1
sentences.append({'text': sentence, 'size': len(tokens)})
if errors > 0:
print(f"Trimmed sentences beyond Cutoff Length: {errors}")
return sentences
# The goal of following code is to create blocks of text + overlapping blocks while:
# respects sentence boundaries
# always uses all the text
# hard cut defined by hard_cut_string or </s> will always end at the end of data block
# no overlapping blocks will be created across hard cut or across </s> token
def precise_cut(text: str, overlap: bool, min_chars_cut: int, eos_to_hc: bool, cutoff_len: int, hard_cut_string: str, debug_slicer:bool):
EOSX_str = '<//>' #hardcut placeholder
EOS_str = '</s>'
print("Precise raw text slicer: ON")
cut_string = hard_cut_string.replace('\\n', '\n')
text = text.replace(cut_string, EOSX_str)
sentences = split_sentences(text, cutoff_len)
print(f"Sentences: {len(sentences)}")
sentencelist = []
currentSentence = ''
totalLength = 0
max_cut = cutoff_len-1
half_cut = cutoff_len//2
halfcut_length = 0
edgeindex = []
half_index = 0
for index, item in enumerate(sentences):
if halfcut_length+ item['size'] < half_cut:
halfcut_length += item['size']
half_index = index
else:
edgeindex.append(half_index)
halfcut_length = -2 * max_cut
if totalLength + item['size'] < max_cut and not currentSentence.endswith(EOSX_str):
currentSentence += item['text']
totalLength += item['size']
else:
if len(currentSentence.strip()) > min_chars_cut:
sentencelist.append(currentSentence.strip())
currentSentence = item['text']
totalLength = item['size']
halfcut_length = item['size']
if len(currentSentence.strip()) > min_chars_cut:
sentencelist.append(currentSentence.strip())
unique_blocks = len(sentencelist)
print(f"Text Blocks: {unique_blocks}")
#overlap strategies:
# don't overlap across HARD CUT (EOSX)
if overlap:
for edge_idx in edgeindex:
currentSentence = ''
totalLength = 0
for item in sentences[edge_idx:]:
if totalLength + item['size'] < max_cut:
currentSentence += item['text']
totalLength += item['size']
else:
#if by chance EOSX is at the end then it's acceptable
if currentSentence.endswith(EOSX_str) and len(currentSentence.strip()) > min_chars_cut:
sentencelist.append(currentSentence.strip())
# otherwise don't cross hard cut
elif EOSX_str not in currentSentence and len(currentSentence.strip()) > min_chars_cut:
sentencelist.append(currentSentence.strip())
currentSentence = ''
totalLength = 0
break
print(f"+ Overlapping blocks: {len(sentencelist)-unique_blocks}")
num_EOS = 0
for i in range(len(sentencelist)):
if eos_to_hc:
sentencelist[i] = sentencelist[i].replace(EOSX_str, EOS_str)
else:
sentencelist[i] = sentencelist[i].replace(EOSX_str, '')
#someone may have had stop strings in the raw text...
sentencelist[i] = sentencelist[i].replace("</s></s>", EOS_str)
num_EOS += sentencelist[i].count(EOS_str)
if num_EOS > 0:
print(f"+ EOS count: {num_EOS}")
#final check for useless lines
sentencelist = [item for item in sentencelist if item.strip() != "</s>"]
sentencelist = [item for item in sentencelist if item.strip() != ""]
if debug_slicer:
# Write the log file
Path('user_data/logs').mkdir(exist_ok=True)
sentencelist_dict = {index: sentence for index, sentence in enumerate(sentencelist)}
output_file = "user_data/logs/sentencelist.json"
with open(output_file, 'w') as f:
json.dump(sentencelist_dict, f,indent=2)
print("Saved sentencelist.json in user_data/logs folder")
return sentencelist
def sliding_block_cut(text: str, min_chars_cut: int, eos_to_hc: bool, cutoff_len: int, hard_cut_string: str, debug_slicer:bool):
EOSX_str = '<//>' #hardcut placeholder
EOS_str = '</s>'
print("Mega Block Overlap: ON")
cut_string = hard_cut_string.replace('\\n', '\n')
text = text.replace(cut_string, EOSX_str)
sentences = split_sentences(text, cutoff_len)
print(f"Sentences: {len(sentences)}")
sentencelist = []
max_cut = cutoff_len-1
#print(f"max_cut: {max_cut}")
advancing_to = 0
prev_block_lastsentence = ""
for i in range(len(sentences)):
totalLength = 0
currentSentence = ''
lastsentence = ""
if i >= advancing_to:
for k in range(i, len(sentences)):
current_length = sentences[k]['size']
if totalLength + current_length <= max_cut and not currentSentence.endswith(EOSX_str):
currentSentence += sentences[k]['text']
totalLength += current_length
lastsentence = sentences[k]['text']
else:
if len(currentSentence.strip()) > min_chars_cut:
if prev_block_lastsentence!=lastsentence:
sentencelist.append(currentSentence.strip())
prev_block_lastsentence = lastsentence
advancing_to = 0
if currentSentence.endswith(EOSX_str):
advancing_to = k
currentSentence = ""
totalLength = 0
break
if currentSentence != "":
if len(currentSentence.strip()) > min_chars_cut:
sentencelist.append(currentSentence.strip())
unique_blocks = len(sentencelist)
print(f"Text Blocks: {unique_blocks}")
num_EOS = 0
for i in range(len(sentencelist)):
if eos_to_hc:
sentencelist[i] = sentencelist[i].replace(EOSX_str, EOS_str)
else:
sentencelist[i] = sentencelist[i].replace(EOSX_str, '')
#someone may have had stop strings in the raw text...
sentencelist[i] = sentencelist[i].replace("</s></s>", EOS_str)
num_EOS += sentencelist[i].count(EOS_str)
if num_EOS > 0:
print(f"+ EOS count: {num_EOS}")
#final check for useless lines
sentencelist = [item for item in sentencelist if item.strip() != "</s>"]
sentencelist = [item for item in sentencelist if item.strip() != ""]
if debug_slicer:
# Write the log file
Path('user_data/logs').mkdir(exist_ok=True)
sentencelist_dict = {index: sentence for index, sentence in enumerate(sentencelist)}
output_file = "user_data/logs/sentencelist.json"
with open(output_file, 'w') as f:
json.dump(sentencelist_dict, f,indent=2)
print("Saved sentencelist.json in user_data/logs folder")
return sentencelist
# Example usage:
# download_file_from_url('https://example.com/path/to/your/file.ext', '/output/directory')
def download_file_from_url(url, overwrite, output_dir_in, valid_extensions = {'.txt', '.json'}):
try:
# Validate and sanitize the URL
#parsed_url = urllib.parse.urlparse(url)
#if not parsed_url.netloc:
# raise ValueError("Invalid URL")
#filename = os.path.basename(parsed_url.path)
# Get the filename from the URL
session = requests.Session()
headers = {}
mode = 'wb'
filename = url.split('/')[-1]
output_dir = str(output_dir_in)
# Construct the full path to the output file
local_filename = os.path.join(output_dir, filename)
# Check if the local file already exists
overw = ''
if os.path.exists(local_filename):
if not overwrite:
yield f"File '{local_filename}' already exists. Aborting."
return
else:
overw = ' [Overwrite existing]'
filename_lower = filename.lower()
# Send an HTTP GET request to the URL with a timeout
file_extension = os.path.splitext(filename_lower)[-1]
if file_extension not in valid_extensions:
yield f"Invalid file extension: {file_extension}. Only {valid_extensions} files are supported."
return
with session.get(url, stream=True, headers=headers, timeout=10) as r:
r.raise_for_status()
# total size can be wildly inaccurate
#total_size = int(r.headers.get('content-length', 0))
block_size = 1024 * 4
with open(local_filename, mode) as f:
count = 0
for data in r.iter_content(block_size):
f.write(data)
count += len(data)
yield f"Downloaded: {count} " + overw
# Verify file size if possible
if os.path.exists(local_filename):
downloaded_size = os.path.getsize(local_filename)
if downloaded_size > 0:
yield f"File '{filename}' downloaded to '{output_dir}' ({downloaded_size} bytes)."
print("File Downloaded")
else:
print("Downloaded file is zero")
yield f"Failed. Downloaded file size is zero)."
else:
print(f"Error: {local_filename} failed to download.")
yield f"Error: {local_filename} failed to download"
except Exception as e:
print(f"An error occurred: {e}")
yield f"An error occurred: {e}"
finally:
# Close the session to release resources
session.close()