Selecting the “right” amount of information to include in a summary is a difficult task. 
A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a “Chain of Density” (CoD) prompt. Specifically, GPT-4 generates an initial entity-sparse summary before iteratively incorporating missing salient entities without increasing the length. Summaries generated by CoD are more abstractive, exhibit more fusion, and have less of a lead bias than GPT-4 summaries generated by a vanilla prompt. We conduct a human preference study on 100 CNN DailyMail articles and find that that humans prefer GPT-4 summaries that are more dense than those generated by a vanilla prompt and almost as dense as human written summaries. Qualitative analysis supports the notion that there exists a tradeoff between infor-mativeness and readability. 500 annotated CoD summaries, as well as an extra 5,000 unannotated summaries, are freely available on HuggingFace.

Introduction

Automatic summarization has come a long way in the past few years, largely due to a paradigm shift away from supervised fine-tuning on labeled datasets to zero-shot prompting with Large Language Models (LLMs), such as GPT-4 (OpenAI, 2023). Without additional training, careful prompting can enable fine-grained control over summary characteristics, such as length (Goyal et al., 2022), topics (Bhaskar et al., 2023), and style (Pu and Demberg, 2023).
An overlooked aspect is the information density of an summary. In theory, as a compression of another text, a summary should be denser–containing a higher concentration of information–than the source document. Given the high latency of LLM decoding (Kad-dour et al., 2023), covering more information in fewer words is a worthy goal, especially for real-time applications. Yet, how dense is an open question. A summary is uninformative if it contains insufficient detail.

If it contains too much information, however, it can be-come difficult to follow without having to increase the overall length. Conveying more information subject to a fixed token budget requires a combination of abstrac-tion, compression, and fusion. There is a limit to how much space can be made for additional information before becoming illegible or even factually incorrect.