Intro to LoRA | Sauron AI

With LoRA (Low-Rank Adaptation), we aim to fine-tune large language models efficiently without updating all the model’s parameters. Instead, we train small adapter weights that “steer” the model toward new tasks, styles, or voices—keeping costs low and training fast.

In this project, the goal is to mimic the voice and writing style of Warren Buffett by fine-tuning an instruction-tuned model (gemma-3-1b-it) on a dataset built from his investor letters.

Intro

I wanted to make a model that could write like Warren Buffett. To get there, I went through three main stages:

Data Prep — Processed Buffett’s investor letters into clean, structured text for training.
Dataset Building — Converted PDFs into a Hugging Face dataset ready for tokenization.
LoRA Training — Fine-tuned a 4-bit quantized Gemma model using LoRA adapters to capture Buffett’s writing style.

Data

About the Dataset

I started with a collection of Warren Buffett’s annual investor letters in PDF format. These documents are rich in narrative and offer consistent phrasing, tone, and structure—making them perfect for style adaptation.

Steps to Build the Dataset

1. Download Buffett’s Letters

I gathered all investor letters from Berkshire Hathaway’s website and stored them in a local ./pdfs folder.

2. Convert PDFs to a Hugging Face Dataset

Using PyPDF + Hugging Face Datasets, I extracted clean paragraphs:

Removed headers, footers, and boilerplate text.
Collapsed messy line breaks, normalized Unicode, and removed soft hyphens.
Filtered out very short or noisy paragraphs.
Deduplicated repeated content.
Added source metadata for each sample.

Finally, I saved the processed dataset:

from datasets import Dataset, DatasetDict

ds = Dataset.from_list(rows).shuffle(seed=42)
DatasetDict({"train": ds}).save_to_disk("buffet_dataset")

The final dataset has structured samples like:

{
    "text": "Warren Buffett discussing corporate governance and capital allocation...",
    "source": "1995_letter.pdf",
}

This dataset serves as the foundation for the fine-tuning stage.

LoRA Training

Why LoRA

LoRA is perfect for this project because we:

Freeze most of the model weights.
Insert small low-rank adapter matrices into targeted layers.
Train only these adapters, making fine-tuning cheap and memory-efficient.
Produce a small adapter file instead of retraining the whole model.

This approach lets us specialize Gemma’s instruction-tuned model in Buffett’s unique tone and phrasing—without forgetting its original capabilities.

Intro to Gemma 3-IT

google/gemma-3-1b-it is a 1B parameter instruction-tuned model from Google. It already knows how to follow instructions, so our fine-tuning focuses purely on stylistic adaptation, not teaching it basic language skills.

What I Did

1. Load & Quantize the Base Model

I loaded Gemma in 4-bit precision using bitsandbytes for efficient fine-tuning:

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_cfg,
    device_map="auto",
)

2. Set Up LoRA

Configured LoRA adapters for both attention and MLP projection layers:

peft_cfg = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, peft_cfg)

3. Tokenize & Pack Data

Tokenized using AutoTokenizer.
Packed sequences into 2048-token blocks for better GPU utilization.
Added EOS tokens between samples to avoid cross-contamination between letters.

4. Train the Model

Used the Hugging Face Trainer with cosine scheduling, gradient checkpointing, and paged_adamw_8bit optimizer:

train_args = TrainingArguments(
    output_dir="lora_style_4bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=4,
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
    gradient_checkpointing=True,
    optim="paged_adamw_8bit",
)

5. Evaluate

After training, I measured perplexity on a held-out subset of the dataset:

{'base_loss': 3.7022903419283435, 'base_ppl': 40.54004868469724, 'lora_loss': 2.8449745714458214, 'lora_ppl': 17.201121263720186}

The LoRA-adapted model significantly reduced perplexity, meaning it predicts Buffett’s writing patterns better than the base model.

Conclusion

Using LoRA, I efficiently adapted a large instruction-tuned model to speak in Warren Buffett’s voice.

This method opens the door to:

Fine-tuning different voices or writing styles.
Domain adaptation without retraining from scratch.
Swapping LoRA adapters for different personas on the same base model.

Code can be found at GitHub