Industry standard manuscript word counts under Linux

Last night I did a word count on my book in progress within Emacs, my text editor of choice. I was astonished that it had apparently gained 2,000 words with only a few edits – from about 43,500 words to about 45,500 words.

I had been using the Linux command-line utility wc to count my words before that, and it had been returning the lower number. I also tested Gedit (results on the high end), and LibreOffice Writer (on the low end).

I wondered on Twitter which I should trust, and a writer friend advised that LibreOffice would probably be closest to Microsoft Word, the standard among professional editors and publishers, so I should stick to the former. However, I ran a word count under my wife Marty’s copy of Word, and it was both highest of all and furthest from LibreOffice. Emacs was closest! Here are the numbers, from high to low:

Microsoft Word 2010 = 45,653
GNU Emacs 24.2 = 45,466
Gedit 3.10.4 = 45,309
wc = 43,855
LibreOffice Writer = 43,726

Moral: M-x count-words in Emacs comes closest to the industry standard – a little low, in fact, which is better than a little high. Gedit is not bad. Stay away from wc and LibreOffice Writer for counting words if you are writing professionally.

Breaking news from my friend: With a much longer manuscript (around 190,000 words), he’s seeing a spread closer to 4,000 words than 2,000, but otherwise his results are quite similar.

A further postscript: A couple of days later, my word count dropped again by about 1,000 words for no discernible reason. I grabbed an older copy of the document from Dropbox and diffed it with the most recent version. I finally understood that I had turned section numbering off in recent versions, and those section numbers had been counted as words, sometimes more than one. For example, section would count as four words. Multiply that by a couple of hundred sections, plus their appearances in the table of contents, and you’ve got a thousand words that can evaporate invisibly.


