Summary
This page describes my current system for creating MP3 files of Wikipedia (and other web) pages that I can listen to on my iPod. If you have a better suggestion on how to do this, feel free to tell me.
Below is a sample of how a few paragraphs sound when converted using NaturalReader. The NaturalReader license prohibits distributing MP3 files made with the software, but I assume the company wouldn't object to (and would probably welcome) my sharing this sample.
Also: Roxanne Heston alerted me to the Pocket app, which can convert web pages to speech on the fly without manual effort. It can play audio at different speeds.
Wikipedia-text converter tool
Copy and paste Wikipedia-article text here:
The result after removing footnotes, "[edit]" text, etc.a:
Contents
Motivation
There are often times when I need to do a physical task and want to listen to something useful while I do it. At other times, I may prefer to listen rather than read because I'm tired of reading. I find that it takes less mental effort to listen than to read (although listening is also slower and harder for visual or mathematical content).
Since 2006, my remedy had been to listen to podcasts on my iPod. This works well, but there are times when I want to learn about a more targeted topic than whatever happens to be discussed on some podcasts. If I'm near my computer, one solution is to search for YouTube videos on the topic, because YouTube has a much greater selection than iTunes, and I also can skip the laborious process of inserting my iPod, opening iTunes, adding the podcast to the iPod, later deleting it, etc. However, this doesn't work in cases where I need to be away from my computer—such as when I'm cleaning the house, getting ready for bed, or doing other afk tasks.
NaturalReader and Text2Speech
The NaturalReader text-to-speech (TTS) software is impressive. It has a free trial version, which I downloaded and began using in summer 2014. The understandability and pronunciation capabilities are surprisingly good. I think these are hands-down the clearest voices:
- David Desktop, US English
- Zira Desktop, US English.
I use Zira Desktop, and it seems to have been hard-coded to get pronunciation right even for some acronyms and names that are not phonetic. For example it correctly reads "spp." as "species".
NaturalReader has a "Floating bar" mode in which you can simply highlight text on a web page to read it rather than copying the text into the reader. The software also helpfully has speed controls, although when the speed is too fast, understandability of the voices becomes harder.
All of what I've described is available in the free version of NaturalReader. A $70 paid version of the software also allows for exporting TTS audio to MP3 format, which can then be added to an iPod.
Note: In 2017, I tried the Mac version of NaturalReader (NaturalReader 14). It's clunkier to use than the Windows version, and the voices are unfortunately much harder to understand than Windows's Zira Desktop. The Mac version also sometimes chokes when trying to save text to MP3. Therefore, on a Mac, I switched to an alternative program, Text2Speech, which is not only a superior product in many ways to the Mac version of NaturalReader but is also vastly cheaper: the Text2Speech PRO app, needed to create MP3 files, is only $4, while NaturalReader is $70 for the same functionality. The voices on NaturalReader sound more human-like, but I think this actually makes them less understandable than the more robotic but clearer voices available with Text2Speech. My favorite Text2Speech voice is Tessa, with Alex and Veena also being competitively understandable.
While I haven't done any rigorous measurements, my informal impression is that it becomes easier to understand a text-to-speech voice after getting many hours of experience with it.
Wikipedia formatting
A few Wikipedia articles, such as "Barack Obama", contain an Ogg audio file in the upper-right corner in which a real person reads the article. But for most articles, TTS is the only way to listen.
Wikipedia articles have some quirks to iron out when being read by TTS. When copying out the text, I follow these steps:
- Copy into Notepad++ the summary at the top, above the table of contents.
- Copy the body below the table of contents and until the "See also" or "References" at the bottom.
- Do a Replace All operation to remove stray "[edit]" tags. Because I have my Find-Replace function in "Regular expression" mode for the next step, I find
\[edit\]
and replace it with an empty string.
- Remove the footnote numbers, which can be confusing when read aloud. Do this by replacing
\[\d+\]
with empty string. In most articles, there should be few or no be false-positive replacements.
- Copy and paste the processed text into NaturalReader, and use its functionality to export to MP3. Then add the file into my iTunes Music library, and copy it onto my iPod.
Steps 3-4 above, as well as some extra cleaning, are done in the converter tool at the top of this page. To see its full details, view this page's source and search for the convert()
function.
Other articles may cite using parentheses, like "(Smith et al., 2002)", rather than footnotes. To remove everything in parentheses, you can replace to empty string this regular expression (regex):
\([^)]+\)
Of course, this will have false positives (e.g., it also captures acronym definitions and other statements in parentheses, like this one right now). Depending on the format of citations in the paper, a more precise regex may work. For instance, this removes citations of the form "(authors, year)":
\([^\d]+, \d\d\d\d\)
iTunes playback position
If you have to stop listening to a Podcast or iTunes U file part of the way through, your iPod probably remembers the position and picks up where you left off. However, it doesn't do this for Music files, presumably on the assumptions that songs are short and people don't resume them halfway through.
Unfortunately, when I create an MP3 file, I can only seem to add it to the Music section of my iTunes library. This means that when I listen to it on my iPod, iTunes by default reverts to the beginning of the file if I leave it paused too long. Then I have to use binary search to find where I left off, and this is time-consuming. For iPods that can't skip ahead and backward, this limitation might be fatal.
Fortunately, it seems this defect is meliorable. When your iPod is plugged in to iTunes, navigate to the Music section, and select all the non-music audio files. Then right-click on the selected area, click "Get Info" -> "Options", and set "Remember position" to "Yes".
Changing playback speed
On my iPod, it appears that I can change playback speed only for audiobooks. This is done by selecting Settings -> Playback -> Audiobooks and then picking Slower, Normal, or Faster. Unfortunately, this doesn't work by default for TTS files, podcasts, or iTunes U files.
However, Andy Naselli shares an excellent workaround: Select one or more files, Get Info -> Options and change "Media Kind" to "Audiobook". This seems to work for any file type I've tried.
Empirically, I found that the speedup is only 1.2 to 1.25 times, i.e., a 30-second segment of audio completes in 24-25 seconds.
Other TTS uses
Prereading: Some materials, such as certain academic papers, can be hard for me to fully understand by listening to them. Usually this is because they contain lots of technical content or numbers/symbols. I find that TTS can still be useful in such cases as a way to "grease the wheels" in my brain for the material, so that when I then read the material by eye shortly after listening to it, I can understand it more quickly. This is because listening to the text helped my brain to begin processing some of the hard-to-unpack statements or words in the text, even though I couldn't understand everything by ear.
Proofreading: TTS can also be helpful for re-reading and proofreading text I've written myself, since it's sometimes easier to hear errors in speech than to detect them in written text, especially if they're not marked as spelling mistakes by my word processor.
Other software
Two of my friends, Max Maxwell Brian Carpendale and Sören Simon Mind, use TextAloud. Max recommended NeoSpeech Bridget as the best voice. As far as speed, he said:
Currently what I do to speed up the audio is turn it into a file at normal speed and then speed it up with a player that has pitch correction. I think pitch correction makes the difference between why some software for speeding up audio is better than others. I use Astro Player for the Android, but you might be able to find something else that works with your iPod.
Another Wikipedia TTS service
Pediaphon is a website that converts Wikipedia articles to audio files for you. However:
- Its TTS software is inferior to NaturalReader, in comprehensibility and pronunciation.
- It doesn't remove footnote numbers from the text.
Another method of converting Wikipedia articles for text-to-speech apps
Mike Ouimet sent me the following message with an alternate method of converting Wikipedia articles to TTS-friendly text:
I wish to share with you a new and very efficient method I developed to cleanly extract the main body of text from any Wikipedia article. This 3-step method produces a very smooth TTS reading by generating plain text without inline citation numbers, photo captions, table of contents, or other extraneous items.
- Insert the title of any Wikipedia article into this special, API-based URL—for example, an article about the video game Pong:
- Copy the resulting text and paste it into any standard "HTML to plain text" converter, such as this one.
- Copy and paste the resulting text into your text-to-speech app and play it!
NOTE 1: The Safari web browser in iOS has a glitch that sometimes prevents the output text in step 1 from being selectable (and copyable). To resolve this, zoom in to the text ALL the way—you should now be able to select a few words in a row. If not, you may have to repeatedly tap the screen until it finally responds with the selection tool. Then, zoom all the way back out and use the selection "grippers" to highlight and copy the entire block of text.
NOTE 2: In some text-to-speech apps, such as Voice Dream, a one-time setting can be made that will make all Wikipedia articles sound even smoother. In Voice Dream, for example, one can go to Pronunciation Dictionary and have it skip just 2 items so they are not spoken aloud—2 asterisks and a long dashed line (these usually appear between each section of a Wikipedia article when using this extraction method):
**
------------------------------------------------------------
Footnotes
- I don't remove {{citation needed}} requests because these help one determine how much to believe a given statement. (back)