Preliminary

This introduction covers the core concepts of large language models (LLMs) and their fundamental mechanisms, explains why copyright protection is essential and in which scenarios disputes typically arise, and briefly distinguishes between watermarking and fingerprinting, highlighting their key differences.

What is large language model?

Coming soon...

Why do large language models need copyright protection?

Training large language models (LLMs) requires significant computational resources, vast datasets, and substantial financial investment. As a result, the models themselves become highly valuable intellectual property. Protecting these assets is crucial to prevent unauthorized use and to ensure that the rights of model creators are respected.

These scenarios highlight the importance of robust copyright protection techniques for large language models.

What is watermarking?

Traditional Watermarking: Real-World Examples

Watermarking, in its most traditional sense, refers to the practice of embedding distinctive marks or patterns into physical objects or media to assert ownership, authenticate origin, or deter counterfeiting. Classic examples include the faint, often intricate patterns visible when holding a banknote up to the light, or the subtle logos embedded in official documents and certificates. Even in the art world, painters have historically signed their works or used unique brushstroke techniques as a form of watermarking. These real-world watermarks serve as both a visible and, at times, hidden guarantee of authenticity and provenance.

Watermarking in Artificial Intelligence

Translating this concept into the digital and artificial intelligence (AI) domain, watermarking has evolved to address the unique challenges posed by large language models (LLMs) and their outputs. In the context of AI, watermarking can be broadly categorized based on its intended purpose. This project focuses on two primary types:

📝Text Watermarking (watermarks embedded in generated content)
đź”’Model Watermarking (watermarks embedded in the model itself)

Text Watermarking

Text Watermarking is primarily concerned with tracing the origin of content generated by LLMs. A key feature of this approach is that every piece of generated text carries an embedded identifier, often imperceptible to the end user but detectable through specialized methods. This enables model developers to verify whether content circulating on the internet originated from their model—an essential capability for asserting copyright and preventing unauthorized use. Furthermore, from a regulatory perspective, governments may require LLM-based services to mark generated content, facilitating the tracing of misinformation or unauthorized dissemination. Technically, text watermarking can be implemented by post-processing generated text or by modifying the model's training or decoding process to embed invisible, yet extractable, information. In practice, text watermarking is typically controlled by the model owner at the service deployment stage.

Key Point: Text watermarking is mainly for tracing the origin of generated content, not for protecting the model itself.

Model Watermarking

Model Watermarking, on the other hand, is designed to protect the intellectual property of the model itself. The focus here is on tracing the model's origin and determining whether a deployed model is derived from a protected, proprietary source. Notably, some survey articles include certain text watermarking methods—specifically those that rely on LLMs generating watermarked text—under the umbrella of model watermarking. However, in this project, we distinguish between the two based on the target of the watermark: methods aimed at tracing generated content are classified as text watermarking, while model watermarking refers exclusively to techniques that protect the model's copyright, such as embedding backdoors or encoding information directly into the model's weights.

Clarification: In this project, only methods that directly protect the model's copyright are considered model watermarking. Methods for tracing generated content are classified as text watermarking.

Can Text Watermarking Be Used for Model Copyright Tracing?

A natural question arises: can text watermarking be used for model copyright tracing? The answer is nuanced. In the copyright dispute scenarios discussed earlier, model owners who recover a stolen model can, in principle, choose whether to enable text watermarking in their deployed services. Most post-hoc text watermarking strategies, and even those that modify decoding strategies, are ineffective for copyright tracing, as adversaries can simply select their preferred watermarking approach when deploying the model. However, certain watermarking methods require modifying the model during training to embed watermarks into the generated content. While this approach does alter the model's weights, its primary objective is still content tracing rather than model protection. As a result, adversaries may still find it relatively easy to remove such watermarks. Ultimately, the main purpose of text watermarking is to enable content attribution, not to provide robust copyright protection for the model itself.

Conclusion: Dedicated model watermarking techniques are essential for robust copyright protection, as text watermarking is not designed for this purpose.

Summary

In summary, watermarking originated as a means of authenticating and protecting physical and digital assets. In the context of LLMs, it is crucial to distinguish between watermarking for content tracing (text watermarking) and watermarking for model copyright protection (model watermarking). This project adopts clear definitions for both, ensuring conceptual clarity and practical relevance for the protection of large language models.

What is model fingerprinting?

Original Concept: Non-Invasive Model Fingerprinting

The concept of model fingerprinting was originally introduced as a non-invasive approach to model copyright protection. Drawing an analogy to the uniqueness of biological fingerprints, deep neural network models are also believed to possess unique "fingerprints"—that is, intrinsic properties or features that can be extracted from the model without the need for active embedding. This line of research was pioneered by Cao et al., who first proposed the model fingerprinting method known as IPGuard. In their approach, model ownership is verified by examining the decision boundary fingerprints of the victim model. For example, in the case of classifiers, different models exhibit distinct decision boundaries. Thus, a model owner can select data points near the decision boundary as fingerprint data points, which can then be used to verify the ownership of a suspicious model.

Key Point: Early model fingerprinting methods are non-invasive and rely on extracting unique, inherent features from the model itself, without modifying its parameters.

Evolution: Expanding the Definition

However, as the research community has rapidly evolved, the definition of model fingerprinting has gradually expanded. Some recent works have begun to refer to backdoor watermarking techniques as a form of model fingerprinting. This paradigm shift has gained traction, and a new consensus is emerging: any method designed to protect model copyright may now be referred to as model fingerprinting. As a result, the scope of model fingerprinting has broadened to include not only non-invasive methods based on intrinsic model properties, but also certain model watermarking techniques.

Clarification: In the current literature, model fingerprinting may refer to both non-invasive fingerprint extraction and invasive watermark-based methods, such as weight watermarking or backdoor watermarking.

Addressing Conceptual Ambiguity

To more accurately accommodate both traditional and contemporary approaches, we refer to these watermark-based fingerprints as weight watermark or backdoor watermark as fingerprint. This terminology helps resolve the conceptual ambiguity that has arisen as the field has evolved.

Summary: Model fingerprinting now encompasses both non-invasive and invasive methods for model copyright protection. This project adopts clear terminology to distinguish between these approaches and ensure conceptual clarity.