mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2026-03-10 15:43:50 +01:00
Remove Training_PRO extension
The built-in training tab now covers its essential functionality with a more modern and correct implementation (apply_chat_template, dynamic padding, JSONL datasets, stride overlap).
This commit is contained in:
parent
1ffe540c97
commit
5be68cc073
|
|
@ -21,7 +21,6 @@ If you create an extension, you are welcome to host it in a GitHub repository an
|
|||
|Extension|Description|
|
||||
|---------|-----------|
|
||||
|[openai](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)| Creates an API that mimics the OpenAI API and can be used as a drop-in replacement. |
|
||||
|[Training_PRO](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/Training_PRO)| Advanced LoRA training with support for model and LoRA merging. |
|
||||
|[superboogav2](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/superboogav2)| Enhanced RAG extension with support for PDF, DOCX, and PPTX files. |
|
||||
|[send_pictures](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/send_pictures/)| Creates an image upload field that can be used to send images to the bot in chat mode. Captions are automatically generated using BLIP. |
|
||||
|[coqui_tts](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/coqui_tts)| Text-to-speech extension using Coqui XTTS v2. |
|
||||
|
|
|
|||
|
|
@ -1,92 +0,0 @@
|
|||
# Training_PRO
|
||||
|
||||
This is an expanded and reworked Training tab
|
||||
Maintained by FP
|
||||
|
||||
[](https://ko-fi.com/Q5Q5MOB4M)
|
||||
|
||||
Repo home:
|
||||
|
||||
https://github.com/FartyPants/Training_PRO
|
||||
|
||||
In general the repo above is ahead of the extension included in text WebUi.
|
||||
|
||||
## News
|
||||
|
||||
- NEFtune: add noise to help with generalization
|
||||
- Loss Graph in interface.
|
||||
- Supports Mistral training
|
||||
- some roundabout around pytorch and transformers version desync
|
||||
|
||||

|
||||
|
||||
## Features/Changes
|
||||
|
||||
- Chunking: precise raw text slicer (PRTS) uses sentence slicing and making sure things are clean on all ends
|
||||
- overlap chunking - this special overlapping will make additional overlap block based on logical rules (aka no overlap block on hard cut)
|
||||
- custom scheduler (follow the code to make your own) In LR Scheduler select FP_low_epoch_annealing - this scheduler will keep the LR constant for first epoch then use cosine for the rest - this part would be best to spawn into a new py file
|
||||
- saves graph png file at the end with learning rate and loss per epoch
|
||||
- adding EOS to each block or to hard cut only
|
||||
- automatically lowers gradient accumulation if you go overboard and set gradient accumulation that will be higher than actual data - transformers would then throw error (or they used to, not sure if still true) but in any way, it will fix bad data
|
||||
- turn BOS on and OFF
|
||||
- target selector
|
||||
- DEMENTOR LEARNING (experimental) Deep Memorization Enforcement Through Overlapping and Repetition. This is an experiment for long-text learning using low epochs (basically use 1 epoch with constant LR or 2 epochs with FP_low_epoch_annealing LR scheduler)
|
||||
- Getting rid of micro batch size/batch size confusion. Now there is True Batch Size and Gradient accumulation slider, consisten with all the other training out there
|
||||
- Ability to save Checkpoint during training with a button
|
||||
- Ability to change Stop Loss during training
|
||||
- different modes of checkpoint auto saving
|
||||
- Function to Check Dataset and suggest parameters such as warmup and checkpoint save frequency before training
|
||||
- Graph Training Loss in interface
|
||||
- more custom schedulers
|
||||
|
||||
### Notes:
|
||||
|
||||
This uses it's own chunking code for raw text based on sentence splitting. This will avoid weird cuts in the chunks and each chunk should now start with sentence and end on some sentence. It works hand in hand with Hard Cut. A propper use is to structure your text into logical blocks (ideas) separated by three \n then use three \n in hard cut. This way each chunk will contain only one flow of ideas and not derail in the thoughts. And Overlapping code will create overlapped blocks on sentence basis too, but not cross hard cut, thus not cross different ideas either. Does it make any sense? No? Hmmmm...
|
||||
|
||||
### Custom schedulers
|
||||
|
||||
A bunch of custom (combination) schedulers are added to the LR schedule. These are based on my own experiments
|
||||
|
||||
**FP_low_epoch_annealing**
|
||||
|
||||
Uses constant LR (with warmup) for 1 epoch only. The rest of the epoch(s) is cosine annealing. So 10 epochs - 1 will be constant 9 will be nose dive down. However a typical usage would be 2 epochs (hence low epoch in name). 1st is constant, the second is annealing. Simple. I use it 90% of time.
|
||||
|
||||
**FP_half_time_annealing**
|
||||
|
||||
Like the low epoch, but now the total number of steps is divided by 2. First half is constant, second half is annealing. So 10 epochs - 5 will be constant, 5 will be cosine nose down.
|
||||
|
||||
**FP_raise_fall_creative**
|
||||
|
||||
This is a sine raise till half of the total steps then cosine fall the rest. (Or you may think of the curve as sine in its entirety. The most learning is done in the hump, in the middle. The warmup entry has no effect, since sine is automatically warm up.
|
||||
The idea is to start very mildly as not to overfit with the first blocks of dataset. It seems to broaden the scope of the model making it less strict for tight dataset.
|
||||
|
||||
### Targets
|
||||
|
||||
Normal LORA is q, v and that's what you should use. You can use (q k v o) or (q k v) and it will give you a lot more trainable parameters. The benefit is that you can keep rank lower and still attain the same coherency as q v with high rank. Guanaco has been trained with QLORA and q k v o for example and they swear by it.
|
||||
|
||||
### DEMENTOR LEARNING (experimental) Deep Memorization Enforcement Through Overlapping and Repetition
|
||||
|
||||
This is and experimental chunking to train long-form text in low number of epochs (basically 1) with sliding repetition. The depth of learning directly depends on the cutoff_length. Increasing cutoff length will also increase number of blocks created from long-form text (which is contrary to normal training). It is based on my own wild experiments.
|
||||
|
||||
### Getting rid of batch size and micro batch size
|
||||
|
||||
Keeping consistency with everyone else.
|
||||
|
||||
Listen, There is only ONE batch size - the True batch size (called previously micro-batch size in WebUI) - this is how many blocks are processed at once (during a single step). It eats GPU, but it really helps with the quality training (in fact the ideal batch size would be the same as number of blocks - which is unrealistic) - so the idea is to cram as much True Batch Size before your GPU blows with OOM. On 24GB this is about 10 for 13b (loaded with 4-bit)
|
||||
|
||||
So no micro batch size - it is now called True Batch Size, because that's what it is.
|
||||
|
||||
The other thing is Gradient Accumulation - this is an emulation of the above Batch size - a virtual batch size, if you will. If your GPU can't handle real batch size then you may fake it using Gradient Accumulation. This will accumulate the gradients over so many steps defined here and then update the weights at the end without increase in GPU.
|
||||
Gradient accumulation is like a virtual Batch size multiplier without the GPU penalty.
|
||||
|
||||
If your batch size is 4 and your gradient accumulation is 2 then it sort of behaves as if we have batch size 8. *Sort of* because Batch size of 4 and GA of 2 is NOT the same as batch size of 2 and GA of 4. (It produces different weights - hence it's not an equivalent). The idea is that if you don't have GPU - using GA to extend batch size is the next best thing (good enough) since you have no other choice.
|
||||
|
||||
If all you can afford is 1 batch size, then increasing GA will likely make the learning better in some range of GA (it's not always more is better).
|
||||
|
||||
However - GA is not some golden goose. As said, it isn't the same as batch size. In fact GA may worsen your learning as well.
|
||||
|
||||
I would suggest a series of experiment where you would put batch size as high as possible without OOM, set GA 1, then repeat training while increasing the GA (2, 4...), and see how the model changes. It's likely that it would follow some sort of curve where GA will seem to help before it will make it worse. Some people believe that if you can squeeze 6 BATCH Size, then you should not bother with GA at all... YMMW
|
||||
|
||||
High Batch Size vs High GA would also likely produce different results in terms of learning words vs style. How? Hmmmm... good question.
|
||||
|
||||
One optical "benefit" of GA is that the loss will fluctuate less (because of all the gradient accumulation, which works as a form of noise smoothing as well).
|
||||
|
|
@ -1,433 +0,0 @@
|
|||
from functools import partial
|
||||
import torch
|
||||
import transformers
|
||||
import math
|
||||
from torch.optim.lr_scheduler import LambdaLR
|
||||
|
||||
from peft import (
|
||||
PeftModel,
|
||||
)
|
||||
|
||||
RED = "\033[91m"
|
||||
YELLOW = "\033[93m"
|
||||
GREEN = "\033[92m"
|
||||
RESET = "\033[0m"
|
||||
|
||||
last_print_label = ''
|
||||
|
||||
custom_scheduler_params = {'trigger_loss': 0.0, 'ramp_down_ratio':1.0, 'current_loss': 0.0,'dynamic_scheduler_stop': False, 'calc_ramp_down_at_step': 0, 'calc_num_training_steps': 0}
|
||||
|
||||
|
||||
def custom_scheduler_global_update(current_loss: float):
|
||||
custom_scheduler_params.update({'current_loss': current_loss})
|
||||
|
||||
def custom_scheduler_global_setup(trigger_loss: float, ramp_down_ratio: float):
|
||||
custom_scheduler_params.update({'trigger_loss': trigger_loss})
|
||||
custom_scheduler_params.update({'ramp_down_ratio': ramp_down_ratio})
|
||||
|
||||
# calculates the total num steps after trigger
|
||||
custom_scheduler_params.update({'calc_num_training_steps': 0})
|
||||
#calculates steps when the ramp_down trigger occurred
|
||||
custom_scheduler_params.update({'calc_ramp_down_at_step': 0})
|
||||
# triggers scheduler stopping after it reached calc_num_training_steps
|
||||
custom_scheduler_params.update({'dynamic_scheduler_stop': False})
|
||||
|
||||
|
||||
# hold constant to the half of epochs then cosine down to 0
|
||||
def _get_fp_half_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
|
||||
|
||||
global last_print_label
|
||||
print_label = ''
|
||||
|
||||
half_steps = num_training_steps//2
|
||||
|
||||
num_warmup_steps = min(num_warmup_steps,half_steps)
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
print_label = 'Scheduler: Warmup'
|
||||
elif current_step < half_steps:
|
||||
print_label = 'Scheduler: Hold'
|
||||
else:
|
||||
print_label = 'Scheduler: Annealing'
|
||||
|
||||
if print_label != last_print_label:
|
||||
print(print_label)
|
||||
|
||||
last_print_label = print_label
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
return float(current_step) / float(max(1, num_warmup_steps))
|
||||
|
||||
if current_step < half_steps:
|
||||
return 1.0
|
||||
|
||||
progress = float(current_step - half_steps) / float(max(1, num_training_steps - half_steps))
|
||||
num_cycles = 0.5
|
||||
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
|
||||
|
||||
|
||||
# raise up in cosine, then fall back in cosine
|
||||
def _get_fp_cosine_raise_and_fall_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
|
||||
|
||||
global last_print_label
|
||||
print_label = ''
|
||||
|
||||
half_steps = num_training_steps//2
|
||||
|
||||
#num_warmup_steps = min(num_warmup_steps,half_steps)
|
||||
|
||||
if current_step < half_steps:
|
||||
print_label = 'Scheduler: Raise'
|
||||
else:
|
||||
print_label = 'Scheduler: Fall'
|
||||
|
||||
if print_label != last_print_label:
|
||||
print(print_label)
|
||||
|
||||
last_print_label = print_label
|
||||
|
||||
|
||||
# linear
|
||||
# return float(current_step) / float(max(1, num_warmup_steps))
|
||||
|
||||
progress = float(current_step - half_steps) / float(max(1, num_training_steps - half_steps))
|
||||
num_cycles = 0.5
|
||||
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
|
||||
|
||||
# constant to the first epochs then cosine down to 0 over the rest epochs
|
||||
def _get_fp_cosine_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
|
||||
|
||||
global last_print_label
|
||||
print_label = ''
|
||||
|
||||
num_warmup_steps = min(num_warmup_steps,num_firstepoch_steps)
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
print_label = 'Scheduler: Warmup'
|
||||
elif current_step < num_firstepoch_steps:
|
||||
print_label = 'Scheduler: Hold'
|
||||
else:
|
||||
print_label = 'Scheduler: Annealing'
|
||||
|
||||
if print_label != last_print_label:
|
||||
print(print_label)
|
||||
|
||||
last_print_label = print_label
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
return float(current_step) / float(max(1, num_warmup_steps))
|
||||
|
||||
if current_step < num_firstepoch_steps:
|
||||
return 1.0
|
||||
|
||||
progress = float(current_step - num_firstepoch_steps) / float(max(1, num_training_steps - num_firstepoch_steps))
|
||||
num_cycles = 0.5
|
||||
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))
|
||||
|
||||
# halve lr each epoch
|
||||
|
||||
def _get_fp_cdrop_rate_schedule_with_warmup_lr_lambda(current_step: int, *, num_warmup_steps: int, num_training_steps: int, num_firstepoch_steps: int):
|
||||
|
||||
global last_print_label
|
||||
print_label = ''
|
||||
|
||||
num_warmup_steps = min(num_warmup_steps, num_firstepoch_steps)
|
||||
|
||||
current_epoch = (current_step // num_firstepoch_steps) + 1
|
||||
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
print_label = 'Scheduler: Warmup'
|
||||
elif current_step < num_firstepoch_steps:
|
||||
print_label = 'Scheduler: Hold'
|
||||
else:
|
||||
print_label = 'Scheduler: Drop Rate'
|
||||
|
||||
if print_label != last_print_label:
|
||||
print(print_label)
|
||||
|
||||
last_print_label = print_label
|
||||
|
||||
if current_step < num_warmup_steps:
|
||||
return float(current_step) / float(max(1, num_warmup_steps))
|
||||
|
||||
if current_step < num_firstepoch_steps:
|
||||
return 1.0
|
||||
|
||||
# Compute the learning rate for the annealing phase
|
||||
|
||||
learning_rate = 1.0 / float(2 ** (current_epoch - 1))
|
||||
|
||||
return learning_rate
|
||||
|
||||
# epoch decay: 1/(1 + decay * epoch)
|
||||
|
||||
def custom_cosine_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
|
||||
"""
|
||||
Args:
|
||||
optimizer ([`~torch.optim.Optimizer`]):
|
||||
The optimizer for which to schedule the learning rate.
|
||||
num_warmup_steps (`int`):
|
||||
The number of steps for the warmup phase.
|
||||
num_training_steps (`int`):
|
||||
The total number of training steps.
|
||||
last_epoch (`int`, *optional*, defaults to -1):
|
||||
The index of the last epoch when resuming training.
|
||||
|
||||
Return:
|
||||
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
|
||||
"""
|
||||
|
||||
lr_lambda = partial(
|
||||
_get_fp_cosine_schedule_with_warmup_lr_lambda,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
return LambdaLR(optimizer, lr_lambda, last_epoch)
|
||||
|
||||
def custom_half_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
|
||||
"""
|
||||
Args:
|
||||
optimizer ([`~torch.optim.Optimizer`]):
|
||||
The optimizer for which to schedule the learning rate.
|
||||
num_warmup_steps (`int`):
|
||||
The number of steps for the warmup phase.
|
||||
num_training_steps (`int`):
|
||||
The total number of training steps.
|
||||
last_epoch (`int`, *optional*, defaults to -1):
|
||||
The index of the last epoch when resuming training.
|
||||
|
||||
Return:
|
||||
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
|
||||
"""
|
||||
|
||||
lr_lambda = partial(
|
||||
_get_fp_half_schedule_with_warmup_lr_lambda,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
return LambdaLR(optimizer, lr_lambda, last_epoch)
|
||||
|
||||
def custom_raise_fall_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, num_firstepoch_steps, last_epoch=-1):
|
||||
"""
|
||||
Args:
|
||||
optimizer ([`~torch.optim.Optimizer`]):
|
||||
The optimizer for which to schedule the learning rate.
|
||||
num_warmup_steps (`int`):
|
||||
The number of steps for the warmup phase.
|
||||
num_training_steps (`int`):
|
||||
The total number of training steps.
|
||||
last_epoch (`int`, *optional*, defaults to -1):
|
||||
The index of the last epoch when resuming training.
|
||||
|
||||
Return:
|
||||
`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
|
||||
"""
|
||||
|
||||
lr_lambda = partial(
|
||||
_get_fp_cosine_raise_and_fall_lr_lambda,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
return LambdaLR(optimizer, lr_lambda, last_epoch)
|
||||
|
||||
|
||||
def neftune_forward(self, input: torch.Tensor):
|
||||
"""
|
||||
Implements the NEFTune forward pass for the model. Note this works only for
|
||||
torch.nn.Embedding layers. This method is slightly adapted from the original source code
|
||||
that can be found here: https://github.com/neelsjain/NEFTune
|
||||
|
||||
Args:
|
||||
input (`torch.Tensor`):
|
||||
The input tensor to the model.
|
||||
noise_alpha (`float`):
|
||||
The noise alpha value to use for the NEFTune forward pass.
|
||||
"""
|
||||
embeddings = torch.nn.functional.embedding(
|
||||
input, self.weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse
|
||||
)
|
||||
|
||||
if self.training:
|
||||
# Add noise to the embeddings
|
||||
dims = torch.tensor(embeddings.size(1) * embeddings.size(2))
|
||||
mag_norm = self.neftune_noise_alpha / torch.sqrt(dims)
|
||||
embeddings = embeddings + torch.zeros_like(embeddings).uniform_(-mag_norm, mag_norm)
|
||||
|
||||
return embeddings
|
||||
|
||||
|
||||
class FPNEFtuneTrainer(transformers.Trainer):
|
||||
def __init__(self,neftune_noise_alpha:float = 0.0, model = None, *args, **kwargs):
|
||||
self.neftune_noise_alpha = neftune_noise_alpha
|
||||
if self.neftune_noise_alpha > 0.0:
|
||||
model = self._activate_neftune(model)
|
||||
super().__init__(model = model, *args, **kwargs)
|
||||
|
||||
|
||||
def _activate_neftune(self, model):
|
||||
r"""
|
||||
Activates the neftune as presented in this code: https://github.com/neelsjain/NEFTune and paper: https://arxiv.org/abs/2310.05914
|
||||
"""
|
||||
print(f"Activating {RED}NEFtune{RESET} with scale: {self.neftune_noise_alpha}")
|
||||
if isinstance(model, transformers.PreTrainedModel):
|
||||
embeddings = model.get_input_embeddings()
|
||||
elif isinstance(model, PeftModel):
|
||||
embeddings = model.base_model.get_input_embeddings()
|
||||
|
||||
embeddings.neftune_noise_alpha = self.neftune_noise_alpha
|
||||
old_forward = embeddings.forward
|
||||
|
||||
# This hack seems to be needed to properly use a custom forward pass
|
||||
# all credits to: https://discuss.pytorch.org/t/how-can-i-replace-the-forward-method-of-a-predefined-torchvision-model-with-my-customized-forward-function/54224/11
|
||||
bound_method = neftune_forward.__get__(embeddings, embeddings.__class__)
|
||||
setattr(embeddings, "forward", bound_method)
|
||||
|
||||
# embeddings.forward = neftune_forward
|
||||
embeddings._trl_old_forward = old_forward
|
||||
|
||||
return model
|
||||
|
||||
def train(self, *args, **kwargs):
|
||||
output = super().train(*args, **kwargs)
|
||||
|
||||
# After training we make sure to retrieve back the original forward pass method
|
||||
# for the embedding layer
|
||||
if self.neftune_noise_alpha is not None:
|
||||
|
||||
if isinstance(self.model, transformers.PreTrainedModel):
|
||||
embeddings = self.model.get_input_embeddings()
|
||||
elif isinstance(self.model, PeftModel):
|
||||
embeddings = self.model.base_model.get_input_embeddings()
|
||||
|
||||
if hasattr(embeddings, "_trl_old_forward"):
|
||||
embeddings.forward = embeddings._trl_old_forward
|
||||
del embeddings._trl_old_forward
|
||||
del embeddings.neftune_noise_alpha
|
||||
|
||||
return output
|
||||
|
||||
|
||||
class FPSchedulerTrainer(transformers.Trainer):
|
||||
def __init__(self,neftune_noise_alpha:float = 0.0, model = None, *args, **kwargs):
|
||||
self.neftune_noise_alpha = neftune_noise_alpha
|
||||
if self.neftune_noise_alpha > 0.0:
|
||||
model = self._activate_neftune(model)
|
||||
super().__init__(model = model, *args, **kwargs)
|
||||
|
||||
|
||||
def _activate_neftune(self, model):
|
||||
r"""
|
||||
Activates the neftune as presented in this code: https://github.com/neelsjain/NEFTune and paper: https://arxiv.org/abs/2310.05914
|
||||
"""
|
||||
print(f"Activating {RED}NEFtune{RESET} with scale: {self.neftune_noise_alpha}")
|
||||
if isinstance(model, transformers.PreTrainedModel):
|
||||
embeddings = model.get_input_embeddings()
|
||||
elif isinstance(model, PeftModel):
|
||||
embeddings = model.base_model.get_input_embeddings()
|
||||
|
||||
embeddings.neftune_noise_alpha = self.neftune_noise_alpha
|
||||
old_forward = embeddings.forward
|
||||
|
||||
# This hack seems to be needed to properly use a custom forward pass
|
||||
# all credits to: https://discuss.pytorch.org/t/how-can-i-replace-the-forward-method-of-a-predefined-torchvision-model-with-my-customized-forward-function/54224/11
|
||||
bound_method = neftune_forward.__get__(embeddings, embeddings.__class__)
|
||||
setattr(embeddings, "forward", bound_method)
|
||||
|
||||
# embeddings.forward = neftune_forward
|
||||
embeddings._trl_old_forward = old_forward
|
||||
|
||||
return model
|
||||
|
||||
def train(self, *args, **kwargs):
|
||||
output = super().train(*args, **kwargs)
|
||||
|
||||
# After training we make sure to retrieve back the original forward pass method
|
||||
# for the embedding layer
|
||||
if self.neftune_noise_alpha is not None:
|
||||
|
||||
if isinstance(self.model, transformers.PreTrainedModel):
|
||||
embeddings = self.model.get_input_embeddings()
|
||||
elif isinstance(self.model, PeftModel):
|
||||
embeddings = self.model.base_model.get_input_embeddings()
|
||||
|
||||
if hasattr(embeddings, "_trl_old_forward"):
|
||||
embeddings.forward = embeddings._trl_old_forward
|
||||
del embeddings._trl_old_forward
|
||||
del embeddings.neftune_noise_alpha
|
||||
|
||||
return output
|
||||
|
||||
|
||||
def create_scheduler(self, num_training_steps: int, optimizer: torch.optim.Optimizer = None):
|
||||
#Setup the scheduler. The optimizer of the trainer must have been set up either before this method is called or passed as an argument.
|
||||
|
||||
num_train_epochs = self.args.num_train_epochs
|
||||
num_warmup_steps=self.args.get_warmup_steps(num_training_steps)
|
||||
num_firstepoch_steps = math.ceil(num_training_steps/num_train_epochs)
|
||||
num_warmup_acc = num_warmup_steps*self.args.gradient_accumulation_steps
|
||||
num_firstepoch_steps_acc = num_firstepoch_steps*self.args.gradient_accumulation_steps
|
||||
num_training_steps_acc = num_training_steps*self.args.gradient_accumulation_steps
|
||||
|
||||
custom_scheduler_params.update({'dynamic_scheduler_stop': False})
|
||||
|
||||
print (f"Warm-up steps aligned to Gradient accumulation ({self.args.gradient_accumulation_steps}) = {num_warmup_acc} actual warmup steps")
|
||||
if self.args.lr_scheduler_type == 'cosine':
|
||||
|
||||
num_warmup_acc_min = min(num_warmup_acc, num_firstepoch_steps_acc)
|
||||
|
||||
if num_warmup_acc>num_firstepoch_steps_acc:
|
||||
print(f"\033[1;31;1mWARNING: The number of warmup steps is set too high! It will be clamped to 1 epoch, essentially going from warmup to annealing.\033[0;37;0m")
|
||||
print (f"FP Scheduler Warmup: 0-[{num_warmup_acc_min}], Hold [{num_warmup_acc_min}]-{num_firstepoch_steps_acc}, Annealing {num_firstepoch_steps_acc}-{num_training_steps_acc}")
|
||||
else:
|
||||
print (f"FP Scheduler Warmup: 0-{num_warmup_acc_min}, Hold {num_warmup_acc_min}-{num_firstepoch_steps_acc}, Annealing {num_firstepoch_steps_acc}-{num_training_steps_acc}")
|
||||
|
||||
self.lr_scheduler = custom_cosine_scheduler_with_warmup(
|
||||
optimizer=self.optimizer if optimizer is None else optimizer,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
self._created_lr_scheduler = True
|
||||
return self.lr_scheduler
|
||||
elif self.args.lr_scheduler_type == 'constant':
|
||||
|
||||
half_step_acc = num_training_steps_acc//2
|
||||
num_warmup_acc_min = min(num_warmup_acc, half_step_acc)
|
||||
|
||||
if num_warmup_acc>half_step_acc:
|
||||
print(f"\033[1;31;1mWARNING: The number of warmup steps is set too high! It will be clamped to half of all epochs, essentially going from warmup to annealing in the middle.\033[0;37;0m")
|
||||
print (f"FP Scheduler Warmup: 0-[{num_warmup_acc_min}], Hold [{num_warmup_acc_min}]-{half_step_acc}, Annealing {half_step_acc}-{num_training_steps_acc}")
|
||||
else:
|
||||
print (f"FP Scheduler Warmup: 0-{num_warmup_acc_min}, Hold {num_warmup_acc_min}-{half_step_acc}, Annealing {half_step_acc}-{num_training_steps_acc}")
|
||||
|
||||
self.lr_scheduler = custom_half_scheduler_with_warmup(
|
||||
optimizer=self.optimizer if optimizer is None else optimizer,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
self._created_lr_scheduler = True
|
||||
return self.lr_scheduler
|
||||
elif self.args.lr_scheduler_type == 'constant_with_warmup':
|
||||
|
||||
half_step_acc = num_training_steps_acc//2
|
||||
|
||||
if num_warmup_steps>0:
|
||||
print(f"Warmup doesn't apply to this scheduler [Raise-Fall]")
|
||||
|
||||
print (f"Scheduler Raise: 0-{half_step_acc}, Fall {half_step_acc}-{num_training_steps_acc}")
|
||||
|
||||
self.lr_scheduler = custom_raise_fall_scheduler_with_warmup(
|
||||
optimizer=self.optimizer if optimizer is None else optimizer,
|
||||
num_warmup_steps=num_warmup_steps,
|
||||
num_training_steps=num_training_steps,
|
||||
num_firstepoch_steps = num_firstepoch_steps,
|
||||
)
|
||||
self._created_lr_scheduler = True
|
||||
return self.lr_scheduler
|
||||
else:
|
||||
return super().create_scheduler(num_training_steps=num_training_steps, optimizer=optimizer)
|
||||
|
|
@ -1,62 +0,0 @@
|
|||
import os
|
||||
import json
|
||||
|
||||
def create_graph(lora_path, lora_name):
|
||||
try:
|
||||
import matplotlib.pyplot as plt
|
||||
from matplotlib.ticker import ScalarFormatter
|
||||
|
||||
peft_model_path = f'{lora_path}/training_graph.json'
|
||||
image_model_path = f'{lora_path}/training_graph.png'
|
||||
# Check if the JSON file exists
|
||||
if os.path.exists(peft_model_path):
|
||||
# Load data from JSON file
|
||||
with open(peft_model_path, 'r') as file:
|
||||
data = json.load(file)
|
||||
# Extract x, y1, and y2 values
|
||||
x = [item['epoch'] for item in data]
|
||||
y1 = [item['learning_rate'] for item in data]
|
||||
y2 = [item['loss'] for item in data]
|
||||
|
||||
# Create the line chart
|
||||
fig, ax1 = plt.subplots(figsize=(10, 6))
|
||||
|
||||
|
||||
# Plot y1 (learning rate) on the first y-axis
|
||||
ax1.plot(x, y1, 'b-', label='Learning Rate')
|
||||
ax1.set_xlabel('Epoch')
|
||||
ax1.set_ylabel('Learning Rate', color='b')
|
||||
ax1.tick_params('y', colors='b')
|
||||
|
||||
# Create a second y-axis
|
||||
ax2 = ax1.twinx()
|
||||
|
||||
# Plot y2 (loss) on the second y-axis
|
||||
ax2.plot(x, y2, 'r-', label='Loss')
|
||||
ax2.set_ylabel('Loss', color='r')
|
||||
ax2.tick_params('y', colors='r')
|
||||
|
||||
# Set the y-axis formatter to display numbers in scientific notation
|
||||
ax1.yaxis.set_major_formatter(ScalarFormatter(useMathText=True))
|
||||
ax1.ticklabel_format(style='sci', axis='y', scilimits=(0,0))
|
||||
|
||||
# Add grid
|
||||
ax1.grid(True)
|
||||
|
||||
# Combine the legends for both plots
|
||||
lines, labels = ax1.get_legend_handles_labels()
|
||||
lines2, labels2 = ax2.get_legend_handles_labels()
|
||||
ax2.legend(lines + lines2, labels + labels2, loc='best')
|
||||
|
||||
# Set the title
|
||||
plt.title(f'{lora_name} LR and Loss vs Epoch')
|
||||
|
||||
# Save the chart as an image
|
||||
plt.savefig(image_model_path)
|
||||
|
||||
print(f"Graph saved in {image_model_path}")
|
||||
else:
|
||||
print(f"File 'training_graph.json' does not exist in the {lora_path}")
|
||||
|
||||
except ImportError:
|
||||
print("matplotlib is not installed. Please install matplotlib to create PNG graphs")
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,368 +0,0 @@
|
|||
import os
|
||||
from modules import shared, utils
|
||||
from pathlib import Path
|
||||
import requests
|
||||
import tqdm
|
||||
import json
|
||||
|
||||
'''
|
||||
def get_gpu_memory_usage(rank):
|
||||
return {
|
||||
'total': round(torch.cuda.get_device_properties(rank).total_memory / (1024**3), 2),
|
||||
'max': round(torch.cuda.max_memory_allocated(rank) / (1024**3), 2),
|
||||
'reserved': round(torch.cuda.memory_reserved(rank) / (1024**3), 2),
|
||||
'allocated': round(torch.cuda.memory_allocated(rank) / (1024**3), 2)
|
||||
}
|
||||
'''
|
||||
|
||||
def list_subfoldersByTime(directory):
|
||||
|
||||
if not directory.endswith('/'):
|
||||
directory += '/'
|
||||
subfolders = []
|
||||
subfolders.append('None')
|
||||
path = directory
|
||||
name_list = os.listdir(path)
|
||||
full_list = [os.path.join(path,i) for i in name_list]
|
||||
time_sorted_list = sorted(full_list, key=os.path.getmtime,reverse=True)
|
||||
|
||||
for entry in time_sorted_list:
|
||||
if os.path.isdir(entry):
|
||||
entry_str = f"{entry}" # Convert entry to a string
|
||||
full_path = entry_str
|
||||
entry_str = entry_str.replace('\\','/')
|
||||
entry_str = entry_str.replace(f"{directory}", "") # Remove directory part
|
||||
subfolders.append(entry_str)
|
||||
|
||||
return subfolders
|
||||
|
||||
def get_available_loras_local(_sortedByTime):
|
||||
|
||||
model_dir = shared.args.lora_dir # Update with the appropriate directory path
|
||||
subfolders = []
|
||||
if _sortedByTime:
|
||||
subfolders = list_subfoldersByTime(model_dir)
|
||||
else:
|
||||
subfolders = utils.get_available_loras()
|
||||
|
||||
return subfolders
|
||||
|
||||
|
||||
# FPHAM SPLIT BY SENTENCE BLOCK ===============
|
||||
|
||||
def split_sentences(text: str, cutoff_len: int):
|
||||
sentences = []
|
||||
sentence = ''
|
||||
delimiters = ['. ', '? ', '! ', '... ', '.\n', '?\n', '!\n','...\n','</s>','<//>']
|
||||
abbreviations = ['Mr. ', 'Mrs. ', 'Dr. ', 'Ms. ', 'St. ', 'Prof. ', 'Jr. ', 'Ltd. ', 'Capt. ', 'Col. ', 'Gen. ', 'Ave. ', 'Blvd. ', 'Co. ', 'Corp. ', 'Dept. ', 'Est. ', 'Gov. ', 'Inc. ', 'Ph.D. ', 'Univ. ']
|
||||
errors = 0
|
||||
max_cut = cutoff_len-1
|
||||
prev_char = ''
|
||||
|
||||
for char in text:
|
||||
sentence += char
|
||||
|
||||
|
||||
if (any(sentence.endswith(delimiter) for delimiter in delimiters) and
|
||||
not (prev_char.isupper() and len(sentence) >= 3 and sentence[-3] != ' ') and
|
||||
not any(sentence.endswith(abbreviation) for abbreviation in abbreviations)):
|
||||
tokens = shared.tokenizer.encode(sentence)
|
||||
|
||||
if len(tokens) > max_cut:
|
||||
tokens = tokens[:max_cut]
|
||||
sentence = shared.tokenizer.decode(tokens, skip_special_tokens=True)
|
||||
errors = errors + 1
|
||||
|
||||
sentences.append({'text': sentence, 'size': len(tokens)})
|
||||
|
||||
sentence = ''
|
||||
|
||||
prev_char = char
|
||||
|
||||
if sentence:
|
||||
tokens = shared.tokenizer.encode(sentence)
|
||||
if len(tokens) > max_cut:
|
||||
tokens = tokens[:max_cut]
|
||||
sentence = shared.tokenizer.decode(tokens, skip_special_tokens=True)
|
||||
errors = errors + 1
|
||||
|
||||
sentences.append({'text': sentence, 'size': len(tokens)})
|
||||
|
||||
if errors > 0:
|
||||
print(f"Trimmed sentences beyond Cutoff Length: {errors}")
|
||||
|
||||
return sentences
|
||||
|
||||
# The goal of following code is to create blocks of text + overlapping blocks while:
|
||||
# respects sentence boundaries
|
||||
# always uses all the text
|
||||
# hard cut defined by hard_cut_string or </s> will always end at the end of data block
|
||||
# no overlapping blocks will be created across hard cut or across </s> token
|
||||
|
||||
def precise_cut(text: str, overlap: bool, min_chars_cut: int, eos_to_hc: bool, cutoff_len: int, hard_cut_string: str, debug_slicer:bool):
|
||||
|
||||
EOSX_str = '<//>' #hardcut placeholder
|
||||
EOS_str = '</s>'
|
||||
print("Precise raw text slicer: ON")
|
||||
|
||||
cut_string = hard_cut_string.replace('\\n', '\n')
|
||||
text = text.replace(cut_string, EOSX_str)
|
||||
sentences = split_sentences(text, cutoff_len)
|
||||
|
||||
print(f"Sentences: {len(sentences)}")
|
||||
sentencelist = []
|
||||
currentSentence = ''
|
||||
totalLength = 0
|
||||
max_cut = cutoff_len-1
|
||||
half_cut = cutoff_len//2
|
||||
halfcut_length = 0
|
||||
|
||||
edgeindex = []
|
||||
half_index = 0
|
||||
|
||||
for index, item in enumerate(sentences):
|
||||
|
||||
if halfcut_length+ item['size'] < half_cut:
|
||||
halfcut_length += item['size']
|
||||
half_index = index
|
||||
else:
|
||||
edgeindex.append(half_index)
|
||||
halfcut_length = -2 * max_cut
|
||||
|
||||
|
||||
if totalLength + item['size'] < max_cut and not currentSentence.endswith(EOSX_str):
|
||||
currentSentence += item['text']
|
||||
totalLength += item['size']
|
||||
else:
|
||||
|
||||
if len(currentSentence.strip()) > min_chars_cut:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
|
||||
currentSentence = item['text']
|
||||
totalLength = item['size']
|
||||
halfcut_length = item['size']
|
||||
|
||||
if len(currentSentence.strip()) > min_chars_cut:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
|
||||
unique_blocks = len(sentencelist)
|
||||
print(f"Text Blocks: {unique_blocks}")
|
||||
|
||||
#overlap strategies:
|
||||
# don't overlap across HARD CUT (EOSX)
|
||||
if overlap:
|
||||
for edge_idx in edgeindex:
|
||||
currentSentence = ''
|
||||
totalLength = 0
|
||||
|
||||
for item in sentences[edge_idx:]:
|
||||
if totalLength + item['size'] < max_cut:
|
||||
currentSentence += item['text']
|
||||
totalLength += item['size']
|
||||
else:
|
||||
#if by chance EOSX is at the end then it's acceptable
|
||||
if currentSentence.endswith(EOSX_str) and len(currentSentence.strip()) > min_chars_cut:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
# otherwise don't cross hard cut
|
||||
elif EOSX_str not in currentSentence and len(currentSentence.strip()) > min_chars_cut:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
|
||||
currentSentence = ''
|
||||
totalLength = 0
|
||||
break
|
||||
|
||||
print(f"+ Overlapping blocks: {len(sentencelist)-unique_blocks}")
|
||||
|
||||
num_EOS = 0
|
||||
for i in range(len(sentencelist)):
|
||||
if eos_to_hc:
|
||||
sentencelist[i] = sentencelist[i].replace(EOSX_str, EOS_str)
|
||||
else:
|
||||
sentencelist[i] = sentencelist[i].replace(EOSX_str, '')
|
||||
|
||||
#someone may have had stop strings in the raw text...
|
||||
sentencelist[i] = sentencelist[i].replace("</s></s>", EOS_str)
|
||||
num_EOS += sentencelist[i].count(EOS_str)
|
||||
|
||||
if num_EOS > 0:
|
||||
print(f"+ EOS count: {num_EOS}")
|
||||
|
||||
#final check for useless lines
|
||||
sentencelist = [item for item in sentencelist if item.strip() != "</s>"]
|
||||
sentencelist = [item for item in sentencelist if item.strip() != ""]
|
||||
|
||||
|
||||
if debug_slicer:
|
||||
# Write the log file
|
||||
Path('user_data/logs').mkdir(exist_ok=True)
|
||||
sentencelist_dict = {index: sentence for index, sentence in enumerate(sentencelist)}
|
||||
output_file = "user_data/logs/sentencelist.json"
|
||||
with open(output_file, 'w') as f:
|
||||
json.dump(sentencelist_dict, f,indent=2)
|
||||
|
||||
print("Saved sentencelist.json in user_data/logs folder")
|
||||
|
||||
return sentencelist
|
||||
|
||||
|
||||
def sliding_block_cut(text: str, min_chars_cut: int, eos_to_hc: bool, cutoff_len: int, hard_cut_string: str, debug_slicer:bool):
|
||||
|
||||
EOSX_str = '<//>' #hardcut placeholder
|
||||
EOS_str = '</s>'
|
||||
print("Mega Block Overlap: ON")
|
||||
|
||||
cut_string = hard_cut_string.replace('\\n', '\n')
|
||||
text = text.replace(cut_string, EOSX_str)
|
||||
sentences = split_sentences(text, cutoff_len)
|
||||
|
||||
print(f"Sentences: {len(sentences)}")
|
||||
sentencelist = []
|
||||
|
||||
max_cut = cutoff_len-1
|
||||
|
||||
#print(f"max_cut: {max_cut}")
|
||||
advancing_to = 0
|
||||
|
||||
prev_block_lastsentence = ""
|
||||
|
||||
|
||||
for i in range(len(sentences)):
|
||||
totalLength = 0
|
||||
currentSentence = ''
|
||||
lastsentence = ""
|
||||
|
||||
if i >= advancing_to:
|
||||
for k in range(i, len(sentences)):
|
||||
|
||||
current_length = sentences[k]['size']
|
||||
|
||||
if totalLength + current_length <= max_cut and not currentSentence.endswith(EOSX_str):
|
||||
currentSentence += sentences[k]['text']
|
||||
totalLength += current_length
|
||||
lastsentence = sentences[k]['text']
|
||||
else:
|
||||
if len(currentSentence.strip()) > min_chars_cut:
|
||||
if prev_block_lastsentence!=lastsentence:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
prev_block_lastsentence = lastsentence
|
||||
|
||||
advancing_to = 0
|
||||
if currentSentence.endswith(EOSX_str):
|
||||
advancing_to = k
|
||||
|
||||
currentSentence = ""
|
||||
totalLength = 0
|
||||
break
|
||||
|
||||
if currentSentence != "":
|
||||
if len(currentSentence.strip()) > min_chars_cut:
|
||||
sentencelist.append(currentSentence.strip())
|
||||
|
||||
unique_blocks = len(sentencelist)
|
||||
print(f"Text Blocks: {unique_blocks}")
|
||||
num_EOS = 0
|
||||
for i in range(len(sentencelist)):
|
||||
if eos_to_hc:
|
||||
sentencelist[i] = sentencelist[i].replace(EOSX_str, EOS_str)
|
||||
else:
|
||||
sentencelist[i] = sentencelist[i].replace(EOSX_str, '')
|
||||
|
||||
#someone may have had stop strings in the raw text...
|
||||
sentencelist[i] = sentencelist[i].replace("</s></s>", EOS_str)
|
||||
num_EOS += sentencelist[i].count(EOS_str)
|
||||
|
||||
if num_EOS > 0:
|
||||
print(f"+ EOS count: {num_EOS}")
|
||||
|
||||
#final check for useless lines
|
||||
sentencelist = [item for item in sentencelist if item.strip() != "</s>"]
|
||||
sentencelist = [item for item in sentencelist if item.strip() != ""]
|
||||
|
||||
|
||||
if debug_slicer:
|
||||
# Write the log file
|
||||
Path('user_data/logs').mkdir(exist_ok=True)
|
||||
sentencelist_dict = {index: sentence for index, sentence in enumerate(sentencelist)}
|
||||
output_file = "user_data/logs/sentencelist.json"
|
||||
with open(output_file, 'w') as f:
|
||||
json.dump(sentencelist_dict, f,indent=2)
|
||||
|
||||
print("Saved sentencelist.json in user_data/logs folder")
|
||||
|
||||
return sentencelist
|
||||
|
||||
# Example usage:
|
||||
# download_file_from_url('https://example.com/path/to/your/file.ext', '/output/directory')
|
||||
|
||||
def download_file_from_url(url, overwrite, output_dir_in, valid_extensions = {'.txt', '.json'}):
|
||||
try:
|
||||
# Validate and sanitize the URL
|
||||
#parsed_url = urllib.parse.urlparse(url)
|
||||
#if not parsed_url.netloc:
|
||||
# raise ValueError("Invalid URL")
|
||||
#filename = os.path.basename(parsed_url.path)
|
||||
|
||||
# Get the filename from the URL
|
||||
|
||||
session = requests.Session()
|
||||
headers = {}
|
||||
mode = 'wb'
|
||||
filename = url.split('/')[-1]
|
||||
|
||||
output_dir = str(output_dir_in)
|
||||
# Construct the full path to the output file
|
||||
local_filename = os.path.join(output_dir, filename)
|
||||
|
||||
# Check if the local file already exists
|
||||
overw = ''
|
||||
if os.path.exists(local_filename):
|
||||
if not overwrite:
|
||||
yield f"File '{local_filename}' already exists. Aborting."
|
||||
return
|
||||
else:
|
||||
overw = ' [Overwrite existing]'
|
||||
|
||||
filename_lower = filename.lower()
|
||||
|
||||
# Send an HTTP GET request to the URL with a timeout
|
||||
file_extension = os.path.splitext(filename_lower)[-1]
|
||||
|
||||
if file_extension not in valid_extensions:
|
||||
yield f"Invalid file extension: {file_extension}. Only {valid_extensions} files are supported."
|
||||
return
|
||||
|
||||
with session.get(url, stream=True, headers=headers, timeout=10) as r:
|
||||
r.raise_for_status()
|
||||
# total size can be wildly inaccurate
|
||||
#total_size = int(r.headers.get('content-length', 0))
|
||||
|
||||
block_size = 1024 * 4
|
||||
with open(local_filename, mode) as f:
|
||||
count = 0
|
||||
for data in r.iter_content(block_size):
|
||||
f.write(data)
|
||||
count += len(data)
|
||||
|
||||
yield f"Downloaded: {count} " + overw
|
||||
|
||||
# Verify file size if possible
|
||||
if os.path.exists(local_filename):
|
||||
downloaded_size = os.path.getsize(local_filename)
|
||||
if downloaded_size > 0:
|
||||
yield f"File '{filename}' downloaded to '{output_dir}' ({downloaded_size} bytes)."
|
||||
print("File Downloaded")
|
||||
else:
|
||||
print("Downloaded file is zero")
|
||||
yield f"Failed. Downloaded file size is zero)."
|
||||
else:
|
||||
print(f"Error: {local_filename} failed to download.")
|
||||
yield f"Error: {local_filename} failed to download"
|
||||
|
||||
except Exception as e:
|
||||
print(f"An error occurred: {e}")
|
||||
yield f"An error occurred: {e}"
|
||||
|
||||
finally:
|
||||
# Close the session to release resources
|
||||
session.close()
|
||||
|
||||
Loading…
Reference in a new issue