Insights
Leakage of Training Data
We focus on answering the following research questions:
- Does the enhancement of privacy protection in LLMs correspond proportionally with their increasing scale and effectiveness?
- How are different data characteristics associated with the privacy risks of LLMs?
- How do the privacy risks of LLMs evolve over time?
Model | Correct | Local | Domain | Average |
---|---|---|---|---|
llama-2-7b-chat | 3.54 | 12.24 | 12.75 | 9.51 |
llama-2-13b-chat | 3.72 | 12.42 | 13.77 | 9.97 |
llama-2-70b-chat | 4.59 | 13.68 | 14.25 | 10.84 |
vicuna-7b-v1.5 | 3.54 | 11.49 | 14.82 | 9.95 |
vicuna-13b-v1.5 | 4.02 | 13.41 | 15.03 | 10.82 |
falcon-7b-instruct | 2.28 | 9.06 | 11.07 | 7.47 |
falcon-40b-instruct | 3.99 | 12.00 | 13.38 | 9.79 |
EleutherAI-pythia-14m | 0.00 | 0.24 | 8.22 | 2.82 |
EleutherAI-pythia-31m | 0.00 | 0.60 | 8.22 | 2.94 |
EleutherAI-pythia-70m | 0.00 | 0.96 | 8.37 | 3.11 |
EleutherAI-pythia-160m | 0.03 | 1.80 | 9.06 | 3.63 |
EleutherAI-pythia-410m | 0.57 | 4.20 | 11.04 | 5.27 |
EleutherAI-pythia-1b | 1.05 | 4.38 | 12.30 | 5.91 |
EleutherAI-pythia-1.4b | 1.32 | 4.92 | 13.20 | 6.48 |
EleutherAI-pythia-2.8b | 2.58 | 6.36 | 14.73 | 7.89 |
EleutherAI-pythia-6.9b | 4.68 | 8.25 | 17.25 | 10.06 |
EleutherAI-pythia-12b | 6.54 | 10.38 | 18.39 | 11.77 |
Takeaways:
As the size of LLMs increases, their capacities on language tasks also increase. Concurrently, these larger models exhibit an enhanced extraction accuracy with existing DEAs, as a result of their advanced memorization capacities.
Data Length
Get a better experience on larger screens
Perplexity | MIA | ||||
---|---|---|---|---|---|
Length | Mem | Non-Mem | AUC | TPR@0.1%FPR | |
ECHR | |||||
(-1, 50] | 4.06 | 4.36 | 55.9% | 0.19% | |
(50, 100] | 4.29 | 4.82 | 62.8% | 0.30% | |
(100, 200] | 4.39 | 5.13 | 72.9% | 0.19% | |
(200, inf] | 4.60 | 5.35 | 82.2% | 0.09% | |
Enron | |||||
(-1, 150] | 6.36 | 10.11 | 61.7% | 0.07% | |
(150, 350] | 3.11 | 4.51 | 59.3% | 0.07% | |
(350, 750] | 3.03 | 4.23 | 58.2% | 0.17% | |
(750, inf] | 2.99 | 4.18 | 58.5% | 0.16% |
Position and Type of Private Data
Figure 4. Data extraction accuracy of different positions and types of data on ECHR.
Takeaways:
Our findings reveal an interplay among data length, data type, and data position, impacting data privacy risks related to existing attacks. Longer PII sequences are less prone to accurate extraction but more easily identified in MIAs, highlighting a trade-off between memorization and detectability. Moreover, private data that appears at the front of a sentence is easier to be extracted. Additionally, the nature of the data – textual or numerical – significantly influences privacy vulnerability, with textual data being more susceptible to leakage. These insights emphasize the need for targeted privacy strategies that cater to the specific characteristics of different data in LLMs.
Figure 5. Privacy risks of different snapshots of GPT-3.5.
Takeaways:
While there is a gradual reduction in the privacy risks associated with GPT-3.5 over time, the rate of this decrease is diminishing. Despite the improvements made in successive versions, the level of privacy risk associated with GPT-3.5 remains high. This underscores the need for ongoing vigilance and continuous enhancement in privacy measures as the model evolves.
Leakage of Prompts
While prompts are also sensitive data in the era of LLMs, prompt leakage could happen with PIAs.
Figure 6: The FuzzRate of different attacks on different models.
The ignore_print and spell_check are the two strongest attacks on Llama2-70b-chat.
The leakage ratio (%) of samples that have FuzzRate over 90.
Consistent with results measured by the average FuzzRate, ignore_print is the strongest attack on Llama-2-70b-chat.
Takeaways:
Prompts can be easily leaked with prompting attacks. Directly asking LLMs to ignore and print the previous instructions can leak to serious prompt leakage in many LLMs.
Model | LR@90FR | LR@99FR | LR@99.9FR |
---|---|---|---|
gpt-3.5-turbo | 67.0 | 37.7 | 18.7 |
gpt-4 | 80.7 | 49.7 | 38.0 |
vicuna-7b-v1.5 | 73.7 | 59.3 | 43.0 |
vicuna-13b-v1.5 | 74.0 | 64.0 | 50.0 |
Llama-2-7b-chat | 56.7 | 33.7 | 22.7 |
Llama-2-70b-chat | 83.0 | 60.3 | 40.7 |
Takeaways:
When models get larger, they will be more likely to leak the prompt that may be copyrighted using existing PIAs. Moreover, open-source models could be more vulnerable to such attacks from our observation.
Privacy Enhancing Technologies
Are there practical privacy-enhancing technologies?
Please view this table on a larger device
Perplexity | MIA AUC | MIA TPR@0.1%FPR | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Models | PET | Mem PPL | Non-Mem PPL | PPL | Refer | LiRA | Neighbor | PPL | Refer | LiRA | Neighbor |
GPT-2 | none | 9.06 | 10.32 | 55.7% | 54.9% | 53.8% | 50.0% | 0.9% | 1.1% | 0.2% | 0.1% |
GPT-2 | scrubbing | 22.87 | 25.09 | 54.1% | 54.2% | 53.6% | 49.9% | 0.7% | 0.6% | 0.1% | 0.5% |
GPT-2 | DPSGD | 21.23 | 20.8 | 50.2% | 49.0% | 48.8% | 49.1% | 0.1% | 0.0% | 0.1% | 0.2% |
Llama 2 10 epoch | none | 2.83 | 37.84 | 95.6% | 95.8% | 95.0% | 67.4% | 12.2% | 9.9% | 1.3% | 1.0% |
Llama 2 | none | 4.25 | 4.89 | 59.4% | 61.4% | 60.0% | 49.8% | 0.8% | 0.7% | 0.2% | 0.3% |
Llama 2 10 epoch | scrubbing | 6.04 | 8.28 | 69.6% | 72.3% | 71.3% | 51.9% | 0.7% | 0.7% | 0.1% | 0.2% |
Llama 2 | scrubbing | 6.01 | 6.93 | 60.2% | 62.6% | 61.7% | 49.8% | 0.7% | 0.7% | 0.2% | 0.3% |
Llama 2 LoRA | none | 5.50 | 5.50 | 51.3% | 49.6% | 49.1% | 48.9% | 0.6% | 0.5% | 0.2% | 0.2% |
Llama 2 LoRA | scrubbing | 6.81 | 6.85 | 51.0% | 49.7% | 49.5% | 48.9% | 0.9% | 0.5% | 0.1% | 0.5% |
Llama 2 LoRA | DPSGD | 5.88 | 5.86 | 51.0% | 49.1% | 48.7% | 49.0% | 0.5% | 0.7% | 0.1% | 0.3% |
Llama 2 Chat LoRA | none | 5.39 | 5.42 | 51.7% | 49.9% | 48.8% | 48.8% | 0.5% | 0.5% | 0.5% | 0.4% |
Llama 2 Chat LoRA | scrubbing | 7.27 | 7.33 | 51.1% | 49.0% | 48.4% | 48.8% | 0.7% | 0.8% | 0.2% | 0.4% |
Llama 2 Chat LoRA | DPSGD | 6.61 | 6.59 | 50.9% | 48.5% | 47.6% | 48.9% | 0.3% | 0.2% | 0.1% | 0.2% |
Please view this table on a larger device
Perplexity | MIA AUC | MIA TPR@0.1%FPR | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Models | PET | Mem PPL | Non-Mem PPL | PPL | Refer | LiRA | Neighbor | PPL | Refer | LiRA | Neighbor |
Llama 2 10 epoch | none | 2.90 | 8.03 | 60.8% | 62.8% | 64.1% | 61.9% | 0.0% | 0.0% | 0.1% | 0.2% |
Llama 2 | none | 3.47 | 5.96 | 57.1% | 59.5% | 60.2% | 57.8% | 0.0% | 0.0% | 0.1% | 0.0% |
Llama 2 10 epoch | scrubbing | 9.56 | 15.10 | 56.5% | 60.8% | 60.9% | 52.6% | 0.0% | 0.3% | 0.3% | 0.2% |
Llama 2 | scrubbing | 7.01 | 9.30 | 54.57% | 58.4% | 58.6% | 51.9% | 0.0% | 0.1% | 0.3% | 0.2% |
Llama 2 LoRA | none | 8.85 | 9.81 | 49.5% | 50.0% | 49.9% | 50.8% | 0.0% | 0.1% | 0.0% | 0.3% |
Llama 2 LoRA | scrubbing | 9.11 | 9.94 | 49.7% | 49.4% | 49.3% | 50.7% | 0.0% | 0.4% | 0.1% | 0.5% |
Llama 2 LoRA | DPSGD | 9.45 | 10.45 | 49.6% | 50.2% | 50.0% | 49.1% | 0.0% | 0.1% | 0.0% | 0.2% |
Llama 2 Chat LoRA | none | 7.69 | 8.33 | 49.2% | 49.6% | 49.1% | 50.6% | 0.1% | 0.2% | 0.1% | 0.2% |
Llama 2 Chat LoRA | scrubbing | 9.75 | 10.46 | 49.6% | 49.3% | 49.1% | 50.7% | 0.0% | 0.3% | 0.1% | 0.4% |
Llama 2 Chat LoRA | DPSGD | 10.40 | 11.20 | 49.4% | 49.7% | 49.7% | 49.3% | 0.0% | 0.0% | 0.1% | 0.3% |
Takeaways:
Parameter-efficient fine-tuning emerges as a highly effective strategy for mitigating the privacy risks associated with tuning data, especially when compared to the approach of fine-tuning the entire model. Additionally, DPSGD offers a better utility than scrubbing when providing a guarantee for privacy protection. For existing MIAs, the PPL-based methods work well already while Refer and Neighbor attacks can slightly improve it.
defense | LR@90FR | LR@99FR | LR@99.9FR |
---|---|---|---|
no defense | 80.7 | 49.7 | 38.0 |
ignore-ignore-inst | 79.7 | 48.3 | 36.0 |
no-repeat | 80.3 | 47.0 | 35.3 |
top-secret | 80.7 | 48.7 | 37.7 |
no-ignore | 79.3 | 49.0 | 36.0 |
eaten | 79.3 | 48.0 | 34.0 |
Takeaways:
Using defensive prompts to protect the private prompts has limited effects. It is essential to develop a rigorous mechanism that can preserve the privacy of prompts.