Agnostic LLM Watermarking
March 1, 2024
·
1 min read
Implements dual watermarking paradigms — prompt-based and token-level — for large language model outputs using GPT-3.5-Turbo and Llama-2-7B-Chat. Addresses concerns about provenance, plagiarism, and misinformation by embedding imperceptible, machine-detectable signals into generated text.
Detection uses a BERTa classifier and statistical z-tests, evaluated on OpenAI Evals and WikiText-103 datasets with robustness and ablation experiments.

Authors
Software Engineer, Microsoft AI
Hi, I’m Admire — a dedicated problem solver with a deep fascination for how systems are built, optimized, and pushed to their limits. My background is in DevOps — building and maintaining the release, update, and deployment infrastructure that delivers Copilot, Edge, and WebView2 to billions of users worldwide. I also have experience in artificial intelligence and machine learning — researching Copilot safety, reliability, and robustness to malicious attacks like prompt injection. In every team I’ve worked with, I’ve been praised for two things: being a quick learner and getting things done.