How much space would it take to store every word ever said?
嗯,用户让我总结一篇文章的内容,控制在100字以内,而且不需要用特定的开头。首先,我需要通读文章,理解它的主要观点。文章讨论的是人类历史上所有说过的话需要多少存储空间。作者从人口数量、平均寿命、每天说的单词数等方面进行了估算。 看起来作者计算出大约需要518艾字节的存储空间,并且将其与蓝鲸的体积进行了比较,得出结论说这些数据可以装在一个蓝鲸大小的空间里。压缩的话,还可以进一步缩小体积。 接下来,我需要把这些信息浓缩成一段简洁的文字,不超过100字。要确保涵盖估算方法、结果以及与蓝鲸的比较。同时,语言要自然流畅,避免使用复杂的术语。 可能的结构是:先说明估算的内容和方法,然后给出结果和比较对象。这样既清晰又符合用户的要求。 最后检查一下字数是否在限制内,并确保没有遗漏关键点。 </think> 文章估算人类历史上所有说过的话所需存储空间约为518艾字节,相当于约84立方米的存储设备体积。这一数字与蓝鲸的体积相当,若采用压缩技术则可大幅缩小至约8.4立方米。 2020-2-9 20:23:47 Author: blog.jonlu.ca(查看原文) 阅读量:1 收藏

If we tally up every word ever said by any person, throughout history, how much physical storage space would be needed to store a representation of those words? Note that I do not mean unique words - rather, every word, ever said, by anyone.

TL;DR

Probably less than your average sized blue whale.

Facts

First, let's start with the things we know.

  • Any written format will most likely be inefficient compared to binary storage. There are extreme cases of information density1, but we'll just deal with what the average person could buy on Amazon. We can comfortably fit 1 TB in 15.0×11.0×1.0 mm, or 165 mm3 space (a readily available microSD sold by SanDisk). We undoubtedly can get even higher amounts of information density, but we can use this as a relatively modern, efficient storage space.

Almost facts

Next let's look at the "pretty good" guesses. These are estimates, but they're probably pretty accurate. We'll say that our numbers are probably right within one order of magnitude.

  • Throughout history, there have been roughly 108,000,000,000 people ever born 2. That will serve as our baseline population.

  • The average number of days lived by a person in 2019 is 365d * 72y = 26,280. This is probably a pretty serious overestimate - for most of history, the lifespan was significantly less than 72 years, but it will be useful as an upper bound.

Wild guesses

Next let's look at rough estimates. These may be off by multiple orders of magnitude, but we'll do our best to get an accurate number, and use the higher estimates when possible.

  • The average number of words said by a person in a lifetime is pretty tough, and probably the estimate with the highest variance. A 2007 study3 found that people say, on average, 16,000 words per day. There is undoubtedly a lot of unknowns here (language, age, culture, setting the study took place in, etc), but we'll use this as another baseline.

Converting words to storage

Using the estimates above, we arrive at roughly 420 million words said, per person (16,000 per day * 26,280 days = ~420,480,000). This is going to be an upper bound - this assumes a person speaks 16000 words a day every day their entire lives, including as a baby.

We can now convert this number into storage costs.

It's easier to calculate character storage than word storage, as characters are more easily directly translated into a unit of informational storage.

The average word length in English is 4.7 characters, while German, one of the longest average word length languages, sits at 11.664. We'll use German as our language, to again try and get a reasonable upper bound.

420 million words, times an average word length of 11.66, gets us ~4.8 billion individual characters spoken per person per lifetime.

4.8 billion characters multiplied by our previous estimate of roughly 108 billion people ever having lived gives us a grand total of 5.2 * 1020 characters, or 518 quintillion characters ever said in history.

518 quintillion characters in ASCII is 518 quintillion bytes, or 518 exabytes. We could also use UTF8, but since we assumed the language is German, we'll stick to ASCII. Coincidentally, 518 exabytes is roughly 50% of the estimated annual traffic from mobile devices5.

Text also compresses extraordinarily well, but we'll assume no compression for the purposes of this estimation (the Data Compression Ratio6 of this body of text would be extraordinary, especially if using an external dictionary). I believe it sort of violates the spirit of the question by allowing compression (any form of compression would most likely get closer to a "unique list of words" ever said, rather than "every word ever said"), but I'll include a bonus at the end taking a look at compression.

Estimate

At 518 exabytes, we'd need 518 million microSD cards. If the volume of each of those is 163 mm3, we'd need roughly 84 billion mm3, which is roughly 84 m3.

A blue whale has a volume of roughly 86m3, which means we can pretty comfortably fit an upper bound of every word said in human history in your average whale, if each person is a Germanic speaking human that speaks 16000 words per day from age 0 to their eventual 2019-average death at 72.7

Blue whale

We could fit every word every said inside the volume of a blue whale

Bonus

What if we take text compression into account? First, text compresses extremly well. It's not out of the realm of possibility to see a 10-to-1 compression ration.8

Secondly, the number of words spoken will not follow a normal distribution across all words. A small subset of words will make up a significant portion of all words said - for instance, in English, 10 words make up 25% of all used words9. This will compress extraordinarily well. Obviously, compressing 518 exabytes is no easy feat (new Google interview question, anyone?), but assuming we could do it, we'd get the size down to ~8.4 m3, or roughly half the size of a standard freight container.

freight container

Compressed? Less than half of a freight container

Footnotes

  1. https://newatlas.com/worlds-densest-solid-state-memory/55599/

  2. According to the Population Reference Bureau

  3. From Science 06 Jul 2007: Vol. 317, Issue 5834, pp. 82 DOI: 10.1126/science.1139940

  4. From Distribution of Word Lengths in Various Languages

  5. https://blogs.cisco.com/sp/the-zettabyte-era-officially-begins-how-much-is-that and https://www.cisco.com/c/en/us/solutions/service-provider/visual-networking-index-vni/index.html#mobile-forecast

  6. https://en.wikipedia.org/wiki/Data_compression_ratio

  7. This is undoubtedly a severe overestimate, but it shows that it's a very "relatively human" size. If I was a betting man I'd guess something closer to the average bathtub would be a better estimate, especially accounting for the number of people who died young, the speakers of non-lengthy languages like German, and people who never spoke a language at all.

  8. http://mattmahoney.net/dc/text.html

  9. https://www.businessinsider.com/zipfs-law-and-the-most-common-words-in-english-2013-10


文章来源: https://blog.jonlu.ca/posts/word-storage
如有侵权请联系:admin#unsafe.sh