Non-invasive Fingerprinting
Non-invasive fingerprinting methods leverage the inherent properties of language models without requiring modifications to their architecture or training process. Specifically, these methods can extract fingerprints from the model's weight space and feature space. Additionally, a novel approach based on prompt optimization strategies attempts to utilize the model's decision boundary characteristics as distinctive fingerprints.
Weight Space Based
Methods that analyze the weight space of language models to identify unique patterns and characteristics.
Feature Space Based
Techniques that utilize different types of features extracted from language models for fingerprinting.
Representation Features
Methods that analyze the internal representations of LLMs, including activation patterns, hidden states, and output logits, which are derived from the data, strategies, and frameworks used during the training process. These representations serve as intrinsic features for model identification, capturing the unique characteristics of how different models process and transform information. The output logits, representing the model's prediction probabilities, also reflect the model's learned patterns and decision boundaries, making them valuable for fingerprinting purposes.
Semantic Features
This category of methods conducts statistical analysis on the content generated by different models, exploiting the linguistic patterns and semantic preferences exhibited by various LLMs as their unique fingerprints.
Prompt Optimization Based
The fundamental process of prompt optimization-based fingerprinting can be understood as follows: given an original input and a predefined response, the method optimizes the prompt to obtain a final version that, when input to the model, produces the predefined response. Since this optimization process is tightly coupled with the model's weights, the resulting optimized prompt is effective only for the target model and ineffective for other unrelated models, thus serving as a stable fingerprint feature.