Instead of two separate paths (format files vs Chat Template), all
instruction training now uses apply_chat_template() with assistant-only
label masking. Users pick a Jinja2 template from the dropdown or use the
model's built-in chat template — both work identically.
- Replace "Raw text file" tab with "Text Dataset" tab using JSONL format with "text" key per row
- Re-add stride overlap for chunking (configurable Stride Length slider, 0-2048 tokens)
- Pad remainder chunks instead of dropping them
- Remove hard_cut_string, min_chars, raw_text_file parameters
- Remove .txt file and directory loading support
- Support multi-turn conversations (OpenAI messages + ShareGPT formats)
- Automatic assistant-only label masking via incremental tokenization
- Use tokenizer.apply_chat_template() for proper special token handling
- Add "Chat Template" option to the Data Format dropdown
- Also accept instruction/output datasets (auto-converted to messages)
- Validate chat template availability and dataset format upfront
- Fix after_tokens[-1] IndexError when train_only_after is at end of prompt
- Update docs
- Use peft's "all-linear" for target modules instead of the old
model_to_lora_modules mapping (only knew ~39 model types)
- Add "Target all linear layers" checkbox, on by default
- Fix labels in tokenize() — were [1]s instead of actual token IDs
- Replace DataCollatorForLanguageModeling with custom collate_fn
- Raw text: concatenate-and-split instead of overlapping chunks
- Adapter backup/loading: check safetensors before bin
- Fix report_to=None crash on transformers 5.x
- Fix no_cuda deprecation for transformers 5.x (use use_cpu)
- Move torch.compile before Trainer init
- Add remove_unused_columns=False (torch.compile breaks column detection)
- Guard against no target modules selected
- Set tracked.did_save so we don't always save twice
- pad_token_id: fall back to eos_token_id instead of hardcoding 0
- Drop MODEL_CLASSES, split_chunks, cut_chunk_for_newline
- Update docs