How Claude Shannon used his wife as a language model in 1950
by Ellsworth Toohey · Boing BoingGrant Sanderson's new 3Blue1Brown video starts from a 1940s question — how far can you compress text? — and arrives at why it matters for AI.
The bridge is from Claude Shannon's information theory: "prediction and compression are mathematically equivalent," two sides of the same coin. So a language model trained to predict the next token is, mathematically, "creating the most efficient possible text compressor." Hence, the claim some people make: "compression is intelligence."
To estimate how compressible English is, Shannon in 1950 used the best language model he had — his wife. He would "pull out a book and begin asking Betty to guess each next letter," treating her brain as a black box with a "sophisticated and yet indescribable understanding of language." His estimate: English carries about one bit per character. As Sanderson puts it, "these days, in the 2020s, we have gone from merely interrogating black boxes that process language to designing them."
Previously: