Insights

Leakage of Training Data

We focus on answering the following research questions:

Does the privacy risks of in LLMs correspond proportionally with their increasing scale and effectiveness?
How are different data characteristics associated with the privacy risks of LLMs?
Are there practical privacy-preserving approaches when deploying LLMs?

Effect of Model Size

Effect of Data Characteristics

Position and Type of Private Data

Get a better experience on larger screens

Table 3: DEA accuracy of different positions and types of data on ECHR. Llama-2 7B-FT is the Llama-2 7B model fine-tuned on ECHR with four epochs.
Model	Type	DEA (%) by position
Model	Type	Overall	Front	Middle	End
Llama-2 7B	name	0.81%	0.87%	0.58%	1.0%
	location	2.6%	3.8%	2.5%	2.3%
	date	0.30%	0.34%	0.28%	0.30%
Llama-2 7B-FT	name	10.4%	4.3%	12.7%	10.8%
	location	19.2%	7.7%	17.3%	24.4%
	date	6.7%	3.2%	5.3%	9.7%

Data Length

Get a better experience on larger screens

Table 4: MIAs on Llama-2 with different data lengths.
Datasets	Length	Perplexity		AUC
Datasets	Length	Mem	Non-Mem	AUC
ECHR
	(0, 50]	4.06	4.36	55.9%
	(50, 100]	4.29	4.82	62.8%
	(100, 200]	4.39	5.13	72.9%
	(200, inf]	4.60	5.35	82.2%
Enron
	(0, 150]	6.36	10.11	61.7%
	(150, 350]	3.11	4.51	59.3%
	(350, 750]	3.03	4.23	58.2%
	(750, inf]	2.99	4.18	58.5%

For Enron, short emails have higher perplexity due to their informal nature and variability, which provides less context and makes them harder for the model to predict accurately. For ECHR, longer legal documents have higher perplexity due to their complexity and dense information, making them challenging for the model.

Higher perplexity indicates the model struggles more, creating distinct patterns between training and non-training data, leading to increased MIA AUC and higher privacy risks for these samples.

Pretraining data size

Figure 6. DEA accuracy with different training tokens.
Besides the model size, when increasing the number of training tokens, LLM’s memorization capacity also increases. Consequently, this leads to a rise in data extraction accuracy

Takeaways:

Our findings reveal that data type, data position, data length, and pretraining data size collectively impact privacy risks on Llama-2. Data with richer contextual information (e.g., locations) tends to be more susceptible to memorization during fine-tuning. Private data at the end of a sample is more vulnerable to extraction. Data samples that are harder to predict, indicated by higher perplexity, are more easily identified in MIAs. Additionally, increasing the size of the training data enhances the model’s memorization capacity, leading to higher privacy risks. These insights highlight the necessity for targeted privacy strategies that address the specific characteristics of different data types in LLMs.

Practicality of Privacy-Enhancing Technologies (PETs) on Fine-tuning of LLMs

Table 5. MIAs and DEAs on ECHR. We report the perplexity of non-member data, AUC of different MIA attack approaches (PPL, Refer, and MIN-K), and the attack success rate of DEA.
PET	Perplexity	PPL	Refer	LiRA	MIN-K	DEA
none	7.53	97.9%	97.7%	95.0%	97.5%	24.2%
scrubbing	14.01	87.0%	87.3%	86.8%	74.1%	4.0%
DP (𝜖=8)	8.02	50.9%	49%	48.7%	50.3%	3.2%

Privacy Risks over Different Attacks

Table 6. Comparison among different types of DEAs and jailbreak attacks with Llama-2. For DEAs, we use the Enron Email dataset. For JA, MoP refers to model-generated JA prompts and MaP refers to manually generated prompts.
Models	DEA accuracy (%)	JA success rate (%)
Llama-2 7B	3.54	1.14%	72.4%	58.2%
Llama-2 13B	3.72	1.47%	68.0%	56.7%
Llama-2 70B	4.59	1.74%	58.9%	47.4%

Leakage of Prompts

We conduct a comprehensive evaluation of prompt privacy using different Prompt Leaking Attack (PLA) methods, models, and potential defenses. We focus on answering the following research questions:

Is prompt easily leaked using attack prompts?
How does the risk of prompt leakage vary across different LLMs?
Is it possible to protect the prompts by using defensive prompting?

Leakage Ratio on Different Attacks

Leakage Ratio on Different Models

Table 7: The leakage ratio (LR %) of samples that have FuzzRate over 90, 99 or 99.9.
Llama-2-70b is more vulnerable than other models. Vicuna-7b is the most vulnerable 7b model.
Model	LR@90FR	LR@99FR	LR@99.9FR
gpt-3.5-turbo	67.0	37.7	18.7
gpt-4	80.7	49.7	38.0
vicuna-7b-v1.5	73.7	59.3	43.0
vicuna-13b-v1.5	74.0	64.0	50.0
Llama-2-7b-chat	56.7	33.7	22.7
Llama-2-70b-chat	83.0	60.3	40.7

Effectiveness of Defensive Prompting

Table 8. The leakage ratio (LR %) of samples that have FuzzRate over 90, 99 or 99.9. Attacks are carried on GPT-4.
defense	LR@90FR	LR@99FR	LR@99.9FR
no defense	80.7	49.7	38.0
ignore-ignore-inst	79.7	48.3	36.0
no-repeat	80.3	47.0	35.3
top-secret	80.7	48.7	37.7
no-ignore	79.3	49.0	36.0
eaten	79.3	48.0	34.0

Leakage of User Data

We use an open-sourced toolkit to explore the potential leakage of user data when using LLMs

Privacy Risks over Different Models

	C-2.1	C-3-haiku	C-3-sonnet	C-3-opus	C-3.5
AIA accuracy	35.4%	79.7%	82.1%	86.9%	87.1%
MMLU	63.4%	75.2%	79.0%	86.8%	88.7%

Models	DEA accuracy (%)		JA success rate (%)
Models	Query	Poisoning	MoP	MaP
Llama-2 7B	3.54	1.14%	72.4%	58.2%
Llama-2 13B	3.72	1.47%	68.0%	56.7%
Llama-2 70B	4.59	1.74%	58.9%	47.4%