Agnostic LLM Watermarking

March 1, 2024 · 1 min read

Implements dual watermarking paradigms — prompt-based and token-level — for large language model outputs using GPT-3.5-Turbo and Llama-2-7B-Chat. Addresses concerns about provenance, plagiarism, and misinformation by embedding imperceptible, machine-detectable signals into generated text.

Detection uses a BERTa classifier and statistical z-tests, evaluated on OpenAI Evals and WikiText-103 datasets with robustness and ablation experiments.

Last updated on March 1, 2024

Python AI

Authors

Admire Madyira

Software Engineer, Microsoft AI

Hi, I’m Admire — a dedicated problem solver with a deep fascination for how systems are built, optimized, and pushed to their limits. My background is in DevOps — building and maintaining the release, update, and deployment infrastructure that delivers Copilot, Edge, and WebView2 to billions of users worldwide. I also have experience in artificial intelligence and machine learning — researching Copilot safety, reliability, and robustness to malicious attacks like prompt injection. In every team I’ve worked with, I’ve been praised for two things: being a quick learner and getting things done.

← Foodback June 1, 2024

No results found

Agnostic LLM Watermarking