Generative AI & LLMs
Work across LLM fine-tuning, retrieval-augmented generation, prompt design, evaluation, and secure API-based AI systems.
Fine-tuned causal language models with PEFT/LoRA, SFTTrainer, and BitsAndBytes quantization.
Designed security-oriented RAG services with split storage, encrypted source text, and retrieval controls.
Built batch inference utilities and validated model behavior with stakeholder-facing demos.
What I Mean by LLM Engineering
LLM engineering is not only calling a model endpoint. The useful work is in shaping data, retrieval, prompts, evaluation, latency, and failure behavior around the model.
My focus is the part between research and usable systems: fine-tuning when model behavior must be adapted, RAG when knowledge must remain external and auditable, and evaluation when the system needs evidence rather than intuition.
Fine-Tuning and Adaptation
I work with parameter-efficient fine-tuning patterns such as LoRA and supervised fine-tuning for domain behavior. The practical details matter: dataset formatting, instruction consistency, tokenizer behavior, sequence length, quantization, batch inference, and regression checks after training.
For support-style models, the hard part is rarely the training command. It is making the dataset precise enough that the model learns the expected tone, boundaries, and answer format without memorizing noise.
- PEFT/LoRA and SFT workflows
- 4-bit and 8-bit quantized training with BitsAndBytes
- Batch inference and qualitative comparison of model candidates
- Prompt and output-format validation for stakeholder demos
RAG as Controlled Evidence Flow
I treat RAG as an information retrieval system with a generative interface. Chunking, metadata, hybrid retrieval, reranking, context assembly, and answerability checks decide whether the model has the right evidence.
For sensitive domains, I prefer architectures where embeddings are not the only memory of the document. Source text, permissions, versioning, and encryption should remain explicit design decisions.
- Split-storage RAG with encrypted source text
- Admin-gated ingestion and reconciliation workflows
- Context redaction and source-aware retrieval
- Health checks, structured errors, and public demo packaging
Practical Pitfalls
The main failure mode is over-trusting the generator. A better system measures retrieval quality before generation and verifies whether the final claims are supported by context.
I also avoid treating prompt engineering as decoration. Prompts define contracts: what the model may answer, when it should refuse, how it should cite, and how uncertainty should surface.