Preliminary
This introduction covers the core concepts of large language models (LLMs) and their fundamental mechanisms, explains why copyright protection is essential and in which scenarios disputes typically arise, and briefly distinguishes between watermarking and fingerprinting, highlighting their key differences.
What is large language model?
Coming soon...
Why do large language models need copyright protection?
Training large language models (LLMs) requires significant computational resources, vast datasets, and substantial financial investment. As a result, the models themselves become highly valuable intellectual property. Protecting these assets is crucial to prevent unauthorized use and to ensure that the rights of model creators are respected.
- Unauthorized model distribution: This occurs when private models are leaked during cloud-based training, through internal mishandling by employees, or via external attacks such as hacking. In these cases, proprietary models may be distributed or used without the owner's consent.
- Violation of open-source license agreements: Unlike private models, open-source models make their architecture and weights publicly available, but usually under specific license terms (e.g., non-commercial use only). Disputes arise when individuals or organizations exploit these models for commercial gain or otherwise violate the license, often by making minor modifications and redistributing them.
These scenarios highlight the importance of robust copyright protection techniques for large language models.
What is watermarking?
Traditional Watermarking: Real-World Examples
Watermarking, in its most traditional sense, refers to the practice of embedding distinctive marks or patterns into physical objects or media to assert ownership, authenticate origin, or deter counterfeiting. Classic examples include the faint, often intricate patterns visible when holding a banknote up to the light, or the subtle logos embedded in official documents and certificates. Even in the art world, painters have historically signed their works or used unique brushstroke techniques as a form of watermarking. These real-world watermarks serve as both a visible and, at times, hidden guarantee of authenticity and provenance.
Watermarking in Artificial Intelligence
Translating this concept into the digital and artificial intelligence (AI) domain, watermarking has evolved to address the unique challenges posed by large language models (LLMs) and their outputs. In the context of AI, watermarking can be broadly categorized based on its intended purpose. This project focuses on two primary types:
Text Watermarking
Text Watermarking is primarily concerned with tracing the origin of content generated by LLMs. A key feature of this approach is that every piece of generated text carries an embedded identifier, often imperceptible to the end user but detectable through specialized methods. This enables model developers to verify whether content circulating on the internet originated from their model—an essential capability for asserting copyright and preventing unauthorized use. Furthermore, from a regulatory perspective, governments may require LLM-based services to mark generated content, facilitating the tracing of misinformation or unauthorized dissemination. Technically, text watermarking can be implemented by post-processing generated text or by modifying the model's training or decoding process to embed invisible, yet extractable, information. In practice, text watermarking is typically controlled by the model owner at the service deployment stage.
Model Watermarking
Model Watermarking, on the other hand, is designed to protect the intellectual property of the model itself. The focus here is on tracing the model's origin and determining whether a deployed model is derived from a protected, proprietary source. Notably, some survey articles include certain text watermarking methods—specifically those that rely on LLMs generating watermarked text—under the umbrella of model watermarking. However, in this project, we distinguish between the two based on the target of the watermark: methods aimed at tracing generated content are classified as text watermarking, while model watermarking refers exclusively to techniques that protect the model's copyright, such as embedding backdoors or encoding information directly into the model's weights.
Can Text Watermarking Be Used for Model Copyright Tracing?
A natural question arises: can text watermarking be used for model copyright tracing? The answer is nuanced. In the copyright dispute scenarios discussed earlier, model owners who recover a stolen model can, in principle, choose whether to enable text watermarking in their deployed services. Most post-hoc text watermarking strategies, and even those that modify decoding strategies, are ineffective for copyright tracing, as adversaries can simply select their preferred watermarking approach when deploying the model. However, certain watermarking methods require modifying the model during training to embed watermarks into the generated content. While this approach does alter the model's weights, its primary objective is still content tracing rather than model protection. As a result, adversaries may still find it relatively easy to remove such watermarks. Ultimately, the main purpose of text watermarking is to enable content attribution, not to provide robust copyright protection for the model itself.
Summary
In summary, watermarking originated as a means of authenticating and protecting physical and digital assets. In the context of LLMs, it is crucial to distinguish between watermarking for content tracing (text watermarking) and watermarking for model copyright protection (model watermarking). This project adopts clear definitions for both, ensuring conceptual clarity and practical relevance for the protection of large language models.
What is model fingerprinting?
Original Concept: Non-Invasive Model Fingerprinting
The concept of model fingerprinting was originally introduced as a non-invasive approach to model copyright protection. Drawing an analogy to the uniqueness of biological fingerprints, deep neural network models are also believed to possess unique "fingerprints"—that is, intrinsic properties or features that can be extracted from the model without the need for active embedding. This line of research was pioneered by Cao et al., who first proposed the model fingerprinting method known as IPGuard. In their approach, model ownership is verified by examining the decision boundary fingerprints of the victim model. For example, in the case of classifiers, different models exhibit distinct decision boundaries. Thus, a model owner can select data points near the decision boundary as fingerprint data points, which can then be used to verify the ownership of a suspicious model.
Evolution: Expanding the Definition
However, as the research community has rapidly evolved, the definition of model fingerprinting has gradually expanded. Some recent works have begun to refer to backdoor watermarking techniques as a form of model fingerprinting. This paradigm shift has gained traction, and a new consensus is emerging: any method designed to protect model copyright may now be referred to as model fingerprinting. As a result, the scope of model fingerprinting has broadened to include not only non-invasive methods based on intrinsic model properties, but also certain model watermarking techniques.
Addressing Conceptual Ambiguity
To more accurately accommodate both traditional and contemporary approaches, we refer to these watermark-based fingerprints as weight watermark or backdoor watermark as fingerprint. This terminology helps resolve the conceptual ambiguity that has arisen as the field has evolved.