by Brian Tomasik
First written: 7 Jan. 2015; last update: 6 Mar. 2017

Introduction

This page describes a few assorted themes that underlie my general writing style. I don't actually have an explicit style; rather, these are just some ideas that may run through my head when deciding how to write an essay.

My history

I think I wrote fairly normally from a young age through middle school, though I did enjoy playing with language, in part inspired by grammar I learned in my German class.

In high school, I began studying vocabulary words and generally reading the dictionary as preparation for SATs and for fun. I also enjoyed the word puzzles of Shakespeare's plays and was inspired by Ralph Nader's large vocabulary. These factors contributed to an increasing bombast in my writing. I endeavored to employ sesquipedalia wherever possible, even if they rendered my prose more cumbersome to read. When writing school assignments, I reined in my grandiloquence to avoid sounding weird, but when writing in private journals, I let loose with unusual verbal constructions that I found satisfying.

Around the start of college, I began to change my approach somewhat. One inspiration was George Orwell's famous "Politics and the English Language", which includes these recommendations:

(ii) Never use a long word where a short one will do. [...]

(v) Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.

I became less concerned about making my sentences sound like 18th-century English prose. As a result, I was also able to write faster.

Around the same time, I became somewhat less obsessive about perfect grammar after I watched a documentary which pointed out that grammar is ultimately arbitrary. For instance, I realized, it's not actually more clear to use "whom" instead of "who" in the objective case; doing so it just a rule that developed and is now on its way out.

To use big words or not?

I remain conflicted on whether to use less common vocabulary words. Some reasons in favor of doing so:

  • I personally find it really enjoyable to read articles that use big words. They sort of tickle my brain.
  • Doing so may signal some degree of sophistication. For instance, big words are common in New Yorker articles and other so-called "high culture" publications.

And reasons against:

  • Some have told me that my prose can occasionally be hard to read due to big words.
  • One man's sophistication is another man's pretentiousness.

In general, my current approach is to use big words freely but only when they seem particularly apposite. I don't go out of my way to figure out how to change a sentence so that it contains bigger words.

How much expertise to assume?

My essays tend to presume a high level of reader knowledge. Partly this is because many of my readers are experts, and I worry about talking down to them. The other reason is that in the age of Wikipedia, it's possible to look up details on which readers aren't clear, and I don't find compelling the idea of reproducing introductory information that already exists into my essays. Why should I reinvent the wheel if Wikipedia already explained it better? Of course, this assumes that my readers have enough motivation to look up what they don't know.

I also tend to assume a high level of philosophical sophistication, again partly because I worry about talking down to experienced readers and partly because explaining the basics has already been done suitably by others. For the most part, I try to write only novel ideas -- or at least material that's specific to myself. (For instance, most of what I say in this piece has probably already been written by someone somewhere, but it's not "standard knowledge" the way that, say, the basic schools of ethics are.)

I also freely assume mathematical sophistication by readers and don't worry much about including equations. On the other hand, I think it's often better to explain a concept in words and by means of examples than using formulas because formulas are often less scrutable in an unnecessary way.

Math envy

I suspect that some academics try to over-use Greek symbols and notation in order to dress up their articles as more rigorous and deserving of more admiration. The fewer people who can understand it, the more intelligent you must be, right? Other authors probably over-use notation because they worked out the ideas in their heads using special notation and don't realize how hard it can be for outsiders to pick up the notation.

I really like this blog post:

math envy. Y'know, the idea that math=intelligence. This utter foolishness leads to the simultaneous fear and awe of anyone who throws math around, as if the presence of mere symbols and equations demonstrates the clear superiority of the author's throbbing, bulging,... intellect. This utter foolishness leads, therefore, to authors who feel the need to add superfluous "mathematics" to their writings in order to demonstrate that their... intelligence measures up that of their colleagues.

Well, turns out, someone finally got around to doing a study on math envy: Kimmo Ericksson (2012) "The nonsense math effect", Judgment and Decision Making 7(6). As expected, those with less training in mathematics tend to rate utterly irrelevant "mathematical content" more highly than its absence. [...] Not to name names, but I've read more than one NLP paper that throws in some worthless equation just to try to look more worthwhile.

Against mathematical elitism

Honestly, I think a lot of mathematical ideas are quite amenable to purely verbal, conceptual explanation. This doesn't necessarily make them less rigorous, and people are more likely to use the ideas properly if they understand them at a conceptual level than if they blindly manipulate symbols. Correspondingly, I think most math is not beyond the ken of Average Joe, and if something doesn't make sense, it's probably because the author didn't explain the idea well enough rather than because the idea is inherently inaccessible to regular minds. Much of math is difficult merely because it's so detailed and requires so much background knowledge -- sort of like a legal document -- rather than because you need to be a genius to comprehend it.

Consider Newtonian physics. We have a rich vocabulary to describe an object's speed, acceleration, mass, shape, wind resistance, friction, and much more. All of this can be done without equations. We can explain and visualize an object's behavior at an intuitive level. There's no reason why the same can't be done for any branch of math or physics. All the components of an equation or proof can in principle be described in conceptual terms. With enough training, abstract mathematical operations can all become as intuitive as Newtonian physics. The main barriers are just the amount of material that needs to be learned and lack of time/motivation to learn it.

When people say "Math is too hard", I think what's often really going on is "I'm not interested enough in math to learn all the background material and spend a lot of time training my brain to make mathematical theorems more intuitive."

Modularity and essay length

I try to keep my essays roughly independent of one another -- with cross-linking when necessary -- because this reduces dependencies and doesn't force readers to have read some essays before others. This is also useful because I typically write the essays out of order.

In the past I've tended to write long essays out of convenience, but I now think shorter essays are probably better. One reason is that, though I'm not an expert at search-engine optimization, I would guess that long pages don't necessarily rank better than short pages, since if the match between a query and a web page is measured by similarity of a normalizeda word distribution, then having more words shouldn't generally improve the match, and indeed, having more words might reduce the match because the page would cover more total topics. Of course, it's possible that page length would improve the quality score assigned to a page.

The net effect of page length on ranking any given web page is unclear, but if pages are shorter, you can have more of them, thereby increasing total page views from search engines. Of course, what matters is how much of your content people read rather than just how many page views there are, but assuming people usually don't read much of a page they stumble upon, then having more page views would increase the total number of words that people read. Plus, more page views means exposure to more total people.

As of 2016, I try to write essays that are as focused as possible on a single issue while still being self-contained and not requiring lots of cross-referencing to other web pages. This approach is inspired by programming, where one is advised to write small functions and classes.

Updates

I dislike the idea of writing essays that will become outdated. My website is a living document that I try to update as my views change (though because of the number of writings I have, I can't always update everything satisfactorily). It's more helpful to readers if a single essay on a coherent topic contains the full set of my thoughts than if I have different thoughts from different times scattered across several date-stamped blog posts. Thus, I treat my site a lot like a wiki -- a private wiki that only I edit.

Repeating myself

I generally aim to avoid repeating myself in my online writings. For instance, I try to use facts, quotes, or detailed arguments in only one location on my websites. The motivation for this is that I don't want to be someone who creates a lot of content just by repeating the same points. This comment from Bill Watterson on why he ended "Calvin & Hobbes" has stuck with me:

By the end of 10 years, I'd said pretty much everything I had come there to say.

It's always better to leave the party early. If I had rolled along with the strip's popularity and repeated myself for another five, 10 or 20 years, the people now "grieving" for "Calvin and Hobbes" would be wishing me dead and cursing newspapers for running tedious, ancient strips like mine instead of acquiring fresher, livelier talent. And I'd be agreeing with them.

That said, I have, perhaps unfortunately, repeated myself a few times when writing about subjects like consciousness, although my goal with writing so many different consciousness essays was partly to drive my viewpoint home by saying it using many different explanations.

I also repeat myself when I write pieces or make videos that are aimed for a different audience than people who will read my detailed writings. And I often repeat myself in comment sections of different blogs or on different Facebook threads.

Logical quotation

In 10th grade (2002), I was taught to use the American style of quotation, where periods and commas go inside quotation marks even when they don't belong, like "this." This style contrasts with "logical quotation", like what I did just there -- keeping the commas and periods where they logically belong. One American friend of mine encouraged logical quotation because it made more sense even though it wasn't standard, but I stuck to American quotation lest readers mentally penalize me on the assumption that I didn't know the rules at all. But in 2014, I discovered that Wikipedia uses logical quotation, so I switched to that thenceforth. Because I haven't gone back to change the quotation style in my earlier writings, you can date sentences I've written as before or after mid-2014 based on quotation style (in analogy with the law of superposition in geology).

Voice

I mostly write the way I would speak. One friend told me that my "writing is not heavily stylised writing, but it's very pleasing to read. It's warm and comforting, like I'm being hugged by the words as I read them." It's also easiest to write quickly with a conversational tone.

Ambiguous pronouns

I've noticed that many of my most confusing sentences are those that use "it", "these", and similar pronouns. The purpose of pronouns is usually to avoid using the same word/phrase twice in a row, but doing thatb using the same word/phrase twice in a row is better than the opposite problem of having an unclear sentence. I think writers should probably err more on the side of clumsy but clear sentences than elegant but vague ones.

Of course, sometimes there are ways to write a passage that avoid confusion while still using a pronoun instead of the original word. This page gives one example of that:

Error: When Samuel dropped the goblet onto the glass table, it broke. (What broke? The table or the goblet?)
Correction: The goblet broke when Samuel dropped it onto the glass table.

Drafts

When I was in school, teachers would often insist on writing a first draft and then rewriting an essay into a final draft. I find this aggravating and stopped doing it once it was no longer required. The problem is that I'm fatigued by the second-round draft and don't put my full effort into it because it feels like I'm just doing the same thing over. Typing monkeyI wonder if multiple drafts were more important in typewriter days before electronic word processors made it possible to rewrite arbitrary pieces of text without disrupting the whole essay.

Once an essay is done, I reread it twice, using the first pass to carefully comb the words and the second to check the overall fluency of the sentences and transitions.

Outlining and "house of cards"

When I think of a new essay I want to write, it feels like an avalanche building up in my brain. Ideas keep accumulating and picking up steam the more I think about the topic. I rarely explicitly write outlines of my essays, but I plan the general structure in my head, possibly including some key sentences. I need to write down the ideas all at once while they're fresh or else I risk forgetting some of them. A blog post I read a while back and can't find now referred to this fragility of ideas in one's head as a "house of cards". If you're distracted for too long, the short-term memory traces fade, and the house collapses.

Transitions, sectioning, and pictures

Teachers in school make a big deal of transition sentences to connect paragraphs. I find myself naturally including these in cases where they seem sensible. I often picture transitions between paragraphs like dominos or puzzle pieces: Two paragraphs fit together by sharing some idea at their borders.

Schools tend to emphasize essays with only raw text. This seems unfortunate, because raw text is suboptimal for conveying ideas efficiently. Readable essays make copious use of sections, which not only allow for quickly finding information but also provide a built-in tl;dr for a piece, similar to self-documenting function names in code. Wikipedia seems to understand this.

I also try to use bullets, numbering, and other text structures as much as possible because

  • this distinguishes the items more clearly than using marker words in a paragraph would, and
  • scaffolding beyond a mass of text in a paragraph is more helpful to skimmers.

Likewise, diagrams and pictures convey ideas more clearly and quickly than text does. They should be used as much as is sensible, though admittedly they also require more effort to create.

Summaries good

Until 2007, most of the essays I wrote took the style of many philosophy papers by jumping into the subject without any Abstract or Summary at the beginning. A reader of my site told me s/he prefers papers with abstracts (as do most scientists). From this point onward I began adding Summaries to most of my writings. This was one of the best pieces of advice I've ever gotten on how to write well. I think my essays are clearer by having a Summary at the top, and the Summary also makes it much easier for casual readers to glean the gist of my point rather than navigating away on the grounds that my piece was too long.

I really think that almost any writing longer than a few paragraphs should have a Summary (except for fiction where the plot would be spoiled by doing so). Often magazine-style articles begin with a catchy event to grab the reader's attention. I think this isn't nice to the reader. Even these pieces should offer a Summary before jumping into the juicy details, since there are many cases where a reader simply lacks the time to digest the whole piece.

Conclusions bad

Many of my essays for school were required to take the five-paragraph format: introduction, three body paragraphs, and conclusion. I often found this annoying, because I ended up saying basically the same things in the second half of my introduction as in the conclusion. I didn't see any point to having both. I felt like a five-paragraph essay was a mozzarella stick with a huge amount of bread on it: there was just a tiny amount of substance in the three body paragraphs surrounded by two repetitive summaries on either end.

Now that I don't have writing requirements, I omit conclusions. Sometimes I design the last sentence or two of the essay to wrap up and repeat a high-level idea, but I don't think this is necessary. If the reader wants a conclusion, he can go back and reread the Summary.

Perhaps George W. Bush would be more fond of making the same point a third time in a concluding section. In 2005, he said "See, in my line of work you got to keep repeating things over and over and over again for the truth to sink in, to kind of catapult the propaganda."

Proofs and programming

In college I remember enjoying the puzzle of fitting together an essay in an elegant way. I was writing proofs and computer programs at the same time, and I remarked to myself how similar all three forms of writing were: All involved a pleasantly creative process of stitching together a nice design that would be clear, effective, and organized.

Essays are organized into sections; code is organized into functions; proofs are organized using lemmas. Subroutines of a program feel almost identical to lemmas of a proof: You take input assumptions, do some processing, and output a conclusion. Essays organized around a core argument may also involve "lemmas" when arguing for each step.

Going off on tangents

A friend mentioned to me that many of my writings ramble more on tangential topics than is standard in the academic literature. This is because I like to include side comments about something interesting when the opportunity arises. Our thoughts are not laser-focused on demonstrating a single argument, and I think essays can be the same way. Tangents may add interest or have academic value in their own right, and I find that including them can spice up prose. Daniel Dennett seems to agree, as his writings are profuse with illustrations and side comments that, while only somewhat relevant to the context at hand, make his overall discussion more fun and memorable, while also teaching the reader some interesting tidbits along the way.

Link rot

Link rot on the web is terrible. In my informal experience, it seems that maybe ~5-10% of my external links break per year.

When I move the locations of something on my site, I'm careful to set up redirects. As a result, I think almost no links to my sites should be broken. But most other websites aren't so careful. I find it tragic when major organizations like PETA overhaul their websites, break at least thousands of incoming links, and don't bother to create redirects. In so doing, they lose a significant portion of their traffic from old links. Why don't they take a small amount of effort to set up redirects?

The way I've decided to combat link rot is to include the title of an article in the "title" attribute of the hyperlink, which you can see by hovering over the hyperlinked text. Doing so helps because even if a link is broken, it's usually possible to find the page by searching its title. Putting link titles in the title="" field also has the advantage that, if a url isn't very descriptive, readers can preview what the article is about by hovering on the hyperlink and examining its title, rather than needing to click through.

Looking up the old link on Internet Archive also often solves link rot, but not always.

I've considered using a tool to scan for broken links in my essays, but fixing link rot is a losing battle that requires constant effort, so I don't always bother to fix broken links.

For scholarly papers, I try to use DOI urls (i.e., http://dx.doi.org/______, where the ______ is the article's DOI) because these are supposed to be persistent urls.

Citation by hyperlink vs. bibliographic info

Full bibliographic citations, such as those used in journals and on Wikipedia, look superficially most credible to readers. However, I tend not to use them because they take a lot of effort to put together, and functionally, readers are just as well off if you link to an article as if you cite it more formally (keeping in mind to also add the article's title in the title="" attribute of the link to guard against link rot). In fact, hyperlinking can be slightly better than using formal citations because

  • if the formal citations are in-text, then readers have to navigate to the "Works Cited" section of your page to look up the article, rather than just clicking on it, and
  • if the formal citations are done using footnotes, then readers won't be able to distinguish between citational vs. discursive footnotes. Readers who don't want to read the citational footnotes may therefore miss your discursive footnotes.

Of course, these problems can be remedied with more advanced styling of the page. For example, some academic articles (like this one) contain in-text citations that, when clicked, pop up bibliographic information that includes hyperlinks. But this level of functionality adds complexity to your website.

The even bigger cost of giving full citations is the manual work required to create them. Of course, there are programs that automatically collect bibliographic data, and you can get BibTeX data for articles from Google Scholar. But I'm nervous about the accuracy of automated bibliographic data. When I wrote this paper, I initially just collected BibTeX information from Google Scholar. But then a friend told me that Google Scholar's citations could be inaccurate. I checked a few instances to see if this was true, and I grudgingly realized that it was. I don't remember exactly how many automated Google Scholar citations in that paper had errors, but it was several -- enough that I was forced to spend ~4 hours going back through all the citations to check them. Maybe it's ok to have some inaccurate citations? Maybe people don't care that much? But I'm nervous about not checking something that has my name on it, so I'd rather not deal with this problem.

How to format and hyperlink in-text citations

As mentioned in the previous section, I prefer to hyperlink my citations rather than requiring a reader to navigate all the way down to a "References" list at the bottom of the page. But questions remain regarding how to format and hyperlink in-text citations. Following are some possibilities for writing an example two-sentence passage:

  1. In "What Is the Difference Between Weak Negative and Non-Negative Ethical Views?", Simon Knutsson problematizes the naive idea that weak negative utilitarians give more "weight" to suffering than happiness, while non-negative utilitarians give "equal" weight. Knutsson believes we need to think about the situation differently.
  2. This paper problematizes the naive idea that weak negative utilitarians give more "weight" to suffering than happiness, while non-negative utilitarians give "equal" weight. The author believes we need to think about the situation differently.
  3. Knutsson (2016) problematizes the naive idea that weak negative utilitarians give more "weight" to suffering than happiness, while non-negative utilitarians give "equal" weight. Knutsson (2016) believes we need to think about the situation differently.
  4. Knutsson (2016) problematizes the naive idea that weak negative utilitarians give more "weight" to suffering than happiness, while non-negative utilitarians give "equal" weight. Knutsson (2016) believes we need to think about the situation differently.

Pros and cons of each:

  1. This approach provides the greatest amount of readily visible information, which is good for readers who want that and bad for readers who don't. If the hyperlink has rotted, the title is easily accessible for Googling, especially for readers on mobile devices who may not have the ability to hover over the link in order to see its title in the title="" attribute. Because of the verbosity of this approach, I tend to avoid it (as of 2017).
  2. This approach is the opposite of the previous one. It optimizes for not cluttering the text, inviting readers to view the title and click the link only if they care to know the details of the source. One downside is that because the article is listed anonymously, you can't easily refer back to the source later on in your piece.
  3. The problem of referring back to an article multiple times is solved by this option, which uses the regular name-year citation style to identify articles in order to refer to them later. By hyperlinking the article at each citation, you allow readers to navigate to the source article from any point in your piece. However, hyperlinking on every occurrence of an in-text citation clutters the text and may be distracting.
  4. This option is the same as the previous one but only hyperlinks the first mention of a citation. This is easier on the writer (since the hyperlink and title only have to be written once) and avoids clutter. However, readers who skim or skip around your article may miss the first instance of the citation and so may not realize that you've given the hyperlink. Readers may wonder why the citation has no link or title associated with it.

As of 2017, I tend to use option #4 both because of its benefits for readers and because it's easy for me as the writer. I asked friends on Facebook about some of these options, and many people liked option #4, although some preferred #3.

Variable names

One of my friends said that math would look less intimidating if it were written in the style of programming, where variables had full, descriptive names rather than single Greek letters. I agree. On the other hand, manipulating such equations would be harder, and the equations would be much longer (probably requiring several lines of "code").

When programming, I prefer to use long variable names, because they're self-documenting, and unlike with comments, you're more likely to remember to refactor them as your program changes. They make code bulkier, yes, but I'd rather understand bulky code than puzzle over cleaner code. Ease of understanding is extremely important for other people and for your future self, who will have forgotten how the code worked within a few years.

Using smileys in emails

I use smilies :) a lot in emails, Facebook comments, and so on. Happy emoticons help break the ice in communication and combat the problem that tone doesn't carry well through text. It puzzles me that emoticons weren't invented centuries ago; they're amazingly useful and convey a lot of information in a small number of characters.

More generally, I try to be cheerful and positive in communication except in rare situations where that isn't warranted. This helps the conversation participants feel more warm toward each other and themselves. It's also more effective at "winning friends and influencing people". Smiles and "thank you"s can be especially useful for defusing situations that might otherwise turn into confrontation.

I use smileys and Facebook "Like"s the way bonobos use sex:

Sexual activity generally plays a major role in bonobo society, being used as what some scientists perceive as a greeting, a means of forming social bonds, a means of conflict resolution, and postconflict reconciliation.[40]

On one Rationally Speaking episode, Julia Galef cited a book (which I can't now find) about what people can learn from dog-training psychology. One example from the book involved a person A who wanted person B to call A more often, but whenever B does call A, A begins the conversation by asking angrily, "Why don't you call me more?!" Part of the answer to A's question is that whenever B does call, the action is negatively reinforcedc by A's accusations. In general, emotional reinforcement of this type is powerful and can make the difference between having lots of friends and collaborators or having few.

Quoting factual material

In 2001, I wrote a school research paper in which I included a number of quotes that contained statistical data from source materials. My teacher told me that unless a passage from the source text is unusually distinctive, I should rewrite the information in my own words rather than quoting it. I grudgingly did so.

Rewriting in one's own words may be necessary for formal publications, but when writing informal pieces, I prefer to revert back to the habit of quoting factual information if it's particularly well stated in the source material. The main reason is because doing this reduces my probability of making mistakes when transferring the information to my own page. It's very easy to miss subtle things when moving data from one source to another. For example, maybe the source paper says "adult mosquito population", but you just write "mosquito population", forgetting to qualify that it's the adult mosquitoes only. Lots of small oversights like this may be introduced when trying to rewrite information in one's own words.

Including quotes is especially valuable for readers when I'm citing web articles without page numbers. Quoting allows readers to Ctrl+f for the quote within its original context on the source web page.

LeVar Burton famously said on Reading Rainbow: "But you don't have to take my word for it." Likewise, when you quote a source text, readers don't have to take your word that you've correctly paraphrased the material and that you're not distorting what the original author said. (Distortions of meaning are still possible with quotations, but they're arguably harder to pull off and easier to discover, because readers can directly search for the quote and check its surrounding context, rather than wondering whether the source author actually said what was attributed to her somewhere in her whole article/book.)

In general, quoting rather than paraphrasing source material seems to me like almost a strict improvement in terms of the quality and transparency of presenting research findings. It's a shame that this practice isn't more widely accepted for formal writing.

Kaj Sotala gave me the following feedback on this section:

I find that as a reader, quotes often tend to impose a small cognitive cost that makes for heavier reading. Long quotes usually have a somewhat different style and context than the work that's quoting them, and this seems to cause a small "switching cost" in my head as I have to adjust for the new style and context, and then switch back when returning back to the original essay.

For this reason I think it's probably better to rewrite things in one's own words, as that will make the content fit your existing style and flow of text and let the reader absorb the information without an additional cost. (Though in practice I still leave a lot of quotes in my text since it's easier.)

I probably agree with this if you're writing for a very large audience, such that the additional cost to the writer of rewording and checking the material is small compared with the collective benefit to the readers. But for most of what I write, the number of readers is relatively small.

Citations after every sentence

In school, I was taught that if the information in several consecutive sentences comes from the same source, you should wait to include a citation for it until the last of those consecutive sentences, to avoid repetition. I see the same done on Wikipedia in many cases.

I strongly dislike this rule. It allows for ambiguity about how much of the stated content actually comes from the cited source. For the same reason, it can lead people to assume that what you've written doesn't have a citation. Except when it's obvious that all my information comes from a given source (such as because I'm discussing the author explicitly in my text), I try to include citations after every sentence.

This is especially important for shared writing pieces like Wikipedia, because someone might insert a sentence in between your original two or more sentences, in which case your earlier, uncited sentences will no longer be consecutive with the final sentence that contains the citation.

Page numbers in citations

I find it unfortunate that standard scientific in-text citations, such as "(Smith 2009)", don't include page numbers. Of course, this is understandable if one is citing some general theme expressed by the entirety of an article or book. But when citing a particular fact, I prefer to include a page number, since this makes it easier for others (including fact-checking reviewers) to find the original information.

It's harder to point to the location of information when citing an HTML website, but one helpful approach is to quote the sentences(s) of the original source that contain(s) the information (perhaps in a footnote or the title="" field of the hyperlink if not in the main text of your piece) so that readers can Ctrl+f for the quoted text in the original source.

Wise (2000) shares my frustration with omission of page numbers in scientific citations. He explains:

Before me is a book chapter I have written, to be published by a scientific press. When I turned it in to the publisher, I gave footnotes citing the pages at which every proposition upon which I relied could be found. I was informed that this is not good scientific notation. The publisher returned the chapter with instructions for me to go through each footnote and delete the offending references to the exact pages.

What is going on? A physicist friend says that some colleagues aim to outline their achievements while giving away as little as possible to competitors. A primatologist friend believes that scientific specialization is now so common that scientists write only for colleagues in their disciplines, who can be assumed to have read everything that the author has.

Henige (2006), pp. 103-04:

Particularly disconcerting is the disconnect between this unconcern with precision in citation and the extraordinary care taken to assure that submitted papers measure up in other ways.14 [...] Assuming that referees are also deprived of this information, it raises the question of why they should be satisfied with this restricted capacity to check authors’ conclusions.

Rekdal (2014), p. 573:

Direct quotations are electronically much more easily searchable than paraphrased sections, and locators are therefore more crucial for the latter. Despite the emergence of tools such as full text databases and Google Books, we still need the page numbers, particularly for source material that does not appear in the form of a direct quotation.9

Citing Wikipedia

Some people love to hate Wikipedia, but as has often been show, Wikipedia is generally very accurate. For articles where many editors have made detailed contributions, I would generally trust Wikipedia more than any other source, even a journal article in Nature. The reason is similar to Linus's Law in software development: "given enough eyeballs, all bugs are shallow". Academic peer review is very imperfect and can in many cases allow errors to slip in (see the next section). My own experience with peer review suggests that reviewers do relatively little checking of the details of one's paper. Plus, academic writings are static and can't be fixed, while Wikipedia can. And sometimes, academic authors give questionable information the veneer of authenticity by including it in a journal article.

That said, there are many Wikipedia articles that aren't particularly trustworthy because they've only been edited by one or two people or because they don't have citations. I tend to consider such articles similar to blog posts in how much I trust them.

If you're going to the trouble of consulting primary sources for information, then obviously doing that is superior to citing secondhand information on Wikipedia (unless the Wikipedia article corrects errors or provides other important context). However, in many cases, it's not efficient to read all the primary sources on a topic, and in these instances, citing Wikipedia makes sense.

Errors in published articles

de Lacey et al. (1985)

de Lacey et al. (1985) examined the accuracy of "quotations" (statements that had citations) and the citations themselves in six medical journals (p. 884). I think de Lacey et al. (1985) defined "quotations" as "All direct quotations of, indirect references to, or summaries of another author's work" (p. 884). de Lacey et al. (1985) discovered that "Of all references, 12% contained errors" that were either slightly or seriously misleading (p. 885).

This is a rather astonishing number, although in my opinion, these errors may not always be as bad as one might think. de Lacey et al. (1985) give some examples (pp. 884-85) of misleading errors. This example (p. 884) of a "slightly misleading" quotation doesn't seem atrocious to me:

a quotation that reducing weight by decreasing intake of energy lowered the blood pressure in most obese hypertensive subjects. The original source, however, studied the effect of a combined low energy and low salt diet on weight and blood pressure.

Apparently, this statement is misleading because of oversimplification. Likewise, de Lacey et al. (1985) explain (p. 885): "Misleading quotations were often due to oversimplification in summarising another author's figures."

The following example (p. 884) of a "seriously misleading" error also seems possibly excusable to me:

One correspondent said: "several studies have shown that the immediate memory span is intact," referring to patients with Korsakoff's syndrome. One of the two quoted sources was a paper on the psychological aspects of rehabilitation in cases of brain injury, with no mention of patients with Korsakoff's syndrome.

Since two sources were quoted here, maybe the first one was directly about the stated finding, while this other source was added as general background reading on memory problems, not intended to buttress the stated claim? (Sadly, standard academic citation methods don't readily allow for distinguishing what kind of information a given citation is supposed to provide.) That said, since the misleading article used the phrase "several studies", maybe it's implied that both of the sources cited should have supported the claim.

Some of the example errors that de Lacey et al. (1985) furnish do seem more serious to me. These findings made me realize I should be slightly more skeptical about any statement I read, even in top journals.

In a 1985 follow-up letter to Lacey et al. (1985), S. R. Lowry found that in letters published by the BMJ, 12% of quotations were "inaccurate", and another 21% were "slightly inaccurate" (p. 1421). Lowry says (p. 1421): "The journal does not check everything, and as a result a third of direct quotations and 8% of references printed were inaccurate to some extent."

Making one's uncertainty explicit

In high school, I was taught not to use phrases like "I think" because such wording was self-evident: Saying "I think X" is equivalent to just saying "X". Formal writing often discourages indications of uncertainty or meta-level discussion about one's process, perhaps because doing so would signal weakness?

I think (see what I did there?) this formal writing style is suboptimal. For example, saying "I think X" is qualitatively different from saying "X". The former sentence tells the reader that this is your opinion or a guess that you're making, rather than an established fact or the opinion of some other entity. Likewise, hedging statements, used appropriately rather than just for politeness, can inform readers about how much weight to give a claim. For example, saying "I would intuitively guess that X" or "I haven't read much about this topic, but my impression is that X" is more useful than either (a) declaring that X is the case or (b) not saying anything about X because you're not certain.

Comments about one's own research process can serve similar functions -- e.g., marking where you've only read a study's "Abstract" rather than the full text, in order to tell the reader that there's some risk that you're misinterpreting the study due to not having read the fine print. This is a useful compromise between not marking uncertainty vs. writing vastly slower due to having to check everything you say thoroughly first.

PDF vs. HTML

When I created my first website in 2006, it was initially a PDF file of essays. A friend advised me that HTML was more readable, so I converted to HTML format instead. Now I strongly prefer HTML, for several reasons:

  • I make edits to my essays all the time. With HTML, I can just edit the piece in WordPress, and the change is done. With PDF, I would have to upload a whole new version of the piece, which would take more time.
  • The formatting is more flexible with HTML. For example, readers can increase or decrease font size as big or small as they want. They can change the background color. And so on.
  • With HTML, you can click links that navigate around within the essay, like in the table of contents or to view a bibliographic entry, and then when you click the "Back" button, you go back to where you were. In a PDF document, if you click an in-text link (e.g., to see a reference in the bibliography), when you click "Back", you navigate away from the whole PDF. At least, this is the behavior I see in Chrome on Windows.
  • HTML allows for JavaScript calculations, interactive graphs, embedded videos, hover-over footnotesd, etc.
  • HTML text is easier to copy and paste all at once without page numbers, headings, etc. getting in the way. (This is useful for me when converting articles to audio format.)
  • HTML can easily be converted to PDF using your browser's "Save as PDF" feature, but the reverse isn't true.
  • HTML can handle equations with tools like MathJax. There are tools to export TeX to HTML. I assume there are tools that can mimic the behavior of BibTeX (maybe this?).
  • HTML is probably less intimidating for non-academic readers, since most websites are in HTML.
  • My anecdotal impression is that PDFs may not rank as well on Google, but I haven't found any verification of this supposition on SEO sites, so it may be wrong.

The main benefits of PDF have to do with formatting standardization, document integrity, permanence, etc., but I don't value these properties highly for my writings. Also, PDFs are still (unfortunately) more common for academic articles, so they may superficially appear more professional for that reason.

This page contains a similar list of reasons in favor of HTML, though not all of it is up-to-date. The author concludes: "Good, standards-compliant HTML is almost always better for use on the web."

Acknowledgments

A discussion with Caspar Oesterheld improved my views on the question of PDFs vs. HTML. Denis Drescher inspired a point I made about footnote citations.

Footnotes

  1. The vector-space model of similarity is effectively normalized by the length of a document via the norm of the tf-idf vector in the denominator of the cosine formula. Likewise, the BM-25 match score is sort of normalized for document length. If you strip out a lot of parameters, the "tf" fraction is something like (term frequency)/|D|, i.e., normalized for document length. Of course, the exact function is messy, and when this is one input to a complex ranking model, the effect of document length becomes messier still.  (back)
  2. I struck out "doing that" here because it's unclear if it refers to "using the same word/phrase twice in a row" or "avoiding using the same word/phrase twice in a row".  (back)
  3. I'm using "negative reinforcement" in a colloquial sense. In formal terminology, what I actually mean is "positively punished", since "negative reinforcement" technically means removal of a bad stimulus.  (back)
  4. Like this.  (back)