Should You Read the Whole Article Before Citing It?

By Brian Tomasik

First published: 13 Jan 2017. Last nontrivial update: 7 Jan 2018.

Summary

This page discusses my history with and a few thoughts regarding how thoroughly to read papers that you cite. The answer to this question depends somewhat on your personality and situation, so don't take my approach as necessarily prescriptive.

Contents

My history

When I was in high school, I was obsessive about reading carefully. I demanded of myself that I understand every sentence of every text that I read. This was partly a rational, learned behavior, because I mostly read textbooks, and with textbooks, understanding every sentence is often a good idea. Several of my classes had DYRT ("did you read this?") quizzes based on textbook reading, and the DYRT-quiz questions were often quite specific—requiring many hours of study of the textbook material to get right.

When learning about colleges toward the end of high school, I heard stories of college students who were assigned 500 pages of reading per week for a single course. I thought to myself that students in those classes must be superhuman (or else that I was just really inferior). In retrospect, I suspect that the students in those classes did a lot of skimming and skipping of reading material.

In college, I became more lax about reading thoroughly, in part because of advice that skimming was often the better way to read things. I actually never took college courses that required skimming; I often thought that hard-science classes were easier than other classes because the reading load was so much lighter. But in my free time, I did a lot of skimming.

In 2007, when writing a literature review for one college course, I asked the professor if I could cite an article based only on its "Abstract", because the full paper wasn't available online or at the library. I was told that, no, I should read the full article to put what was said in context. I thought to myself that this was overkill and that most other students probably were also only reading abstracts for some of their citations.

For writings on my own website, I historically was pretty lax about reading articles before citing them. This practice was born of necessity: During college and while working at Microsoft, I had very little free time and usually had to try to finish an article I was writing within a day or two. Many of the original articles on my website were written in a single sitting, with some material added later on when I came across it, because I usually only had a few hours to put something together before I had to return to homework or Microsoft work.

After I left Microsoft in 2013, I continued to write relatively quickly, in part out of habit and in part because I wanted to cover a lot of intellectual ground without dwelling on details. My focus was on breadth over depth. This probably made sense, although the quality of the resulting articles was lower than what would have been the case had I read more thoroughly. In a few cases, I took more time to read the literature on a topic in detail, such as with "Do Artificial Reinforcement-Learning Agents Matter Morally?" But even with that paper, I skimmed some of the articles I cited (especially the technical neuroscience and AI ones) because there was so much material to get through.

Reading articles more fully

By 2015, and especially in 2016, I'm beginning to spend more time to read the material that I cite thoroughly. There are a few reasons for this:

Is reading "fine print" worthwhile?

Some papers I read, such as original scientific studies with lots of technical implementation details and statistical analysis, aren't very insight-dense in the sense of containing lots of high-level information about a topic per paragraph. However, I still think it's useful to read such papers in order to get a sense of what the fine print looks like—e.g., what it takes to carry out a study, what the standard techniques are, what caveats should be applied to the results, how to do statistical tests for a given type of problem, etc. These details would be more relevant if I were planning to do my own original research in these fields (I'm not), but they're still useful for being an informed consumer of studies by others in these fields.

Around 2005, I asked a friend of mine at college why the philosophy department had students read primary-source writings by philosophers rather than just assigning readings from a philosophy textbook. I thought that surely one could gather more insight per paragraph by reading a textbook? In retrospect, I think some textbook reading combined with some primary-source reading is optimal, since primary sources convey what the cutting edge of a field is actually like, warts and all, in a way that textbooks usually do not.

Another reason I like to read fine print is that I want to force myself to actually try to understand things that may not at first make sense. When you skim material, you tend to pick up information that's easy to process and skip over stuff that's too hard to figure out. But figuring out things that don't immediately make sense is important for intellectual growth. If you don't take the time to pore over a topic in detail, how will you ever learn it?

I first got a Super Nintendo when I was in 4th grade. One of my parents insisted that I could only play it if I also spent a certain amount of time on other things, such as learning to play the piano. I worked my way through about two introductory piano music books, and I could play the most basic of songs. However, I never got much beyond that point because learning the new material seemed hard, and it was easier to accumulate my "playing piano" time by repeating the easy songs I already knew.

I once had a friend ask me about a math-heavy academic paper that I had read. The friend asked whether I was able to make sense of the paper. I said that I was able to, though it took some effort to figure it out. My friend said, "Ok, I just need to not be lazy while reading the paper."

An informal suspicion of mine is that part of the reason why "humanities" people often think they're not good at math is that they may try to learn math at the same pace as they read a novel. Unless you're a literal genius, this doesn't work. Math requires slowing down, and people used to reading novels may find this unnatural and boring.

Of course, skipping over fine print is necessary in many cases, in order to avoid spending inordinate amounts of time understanding details that other people have taken care of. But pushing oneself to really understand at least a few topics—without just skipping over the parts that don't make sense—seems like an important exercise.

Narrowing of focus

Reading the whole article makes more sense the more you plan to specialize in a given topic, because in that case, reading the article is an especially useful investment for the future. Once you've read the literature in a specific area, you can write several papers about that topic without reading many entirely new articles, because each of your papers may cite roughly the same sources. There are thus economies of scale to specializing, although at the cost of flexibility and interdisciplinary knowledge:

In order to get published, one must stay current with the literature in one’s field. In most cases, that literature is enormous and constantly growing. So, to phrase matters in economic terms, there is a very large fixed cost to publishing in a given area. That fixed cost discourages one from doing work in new areas and encourages one to remain in the area one is already familiar with. The narrower the area, the easier it is to learn enough to publish there.

I struggle with the question of how specialized to be. I want to produce high-quality articles, which requires the kind of expertise that comes from reading extensively on a given topic. But I also don't want to prematurely focus on one area when I might later decide that another area is more altruistically important.

Fully reading the sources you cite makes more sense if you're preparing an article for formal publication that will be hard or impossible to correct later.

Speed vs. accuracy tradeoff

How much time should you spend making sure you got your facts right? This is a question that every researcher faces, although I haven't come across much discussion of the issue. My impression is that people have pretty wide variation in how careful they choose to be about fact-checking.

The podcast Stuff You Should Know often includes at the end of each episode corrections on previous episodes. Some of the errors are non-trivial. Stuff You Should Know is a podcast by generalists on a wide range of topics, so having some errors is to be expected. Still, this is one of the most popular podcasts on iTunes, and I would guess that its rate of factual mistakes is not vastly higher than normal. This suggests that a lot of published media may contain more errors than one might have thought. Those who spend more time making sure they get their facts right will, unfortunately, put out less total content.

If it takes 10 hours to write an article in which 4% of sentences have an error, I would guess it would take something like 20 or 30 hours to write the same article with an error rate of 2%. To achieve greater accuracy, more effort is required both in terms of reading the articles you're citing more carefully and cross-checking against other sources. Further reducing the error rate to 1% might require significantly more time investment. It's interesting to ponder where on this speed-vs.-accuracy tradeoff curve to be.

Personally I try to be pretty careful about accurately reporting information from a given source. I sometimes double- or even triple-check that what I wrote exactly matches what the source article was saying. However, I don't necessarily undertake the further effort of checking with other sources to make sure that the source I'm citing got the information correct. So I'm sure my writings still do contain a decent number of errors.

One benefit of writing in your area of expertise is that you can naturally make fewer errors and even spot errors in the sources you may be citing.

Of course, there are likely many issues on which current science as a whole is mistaken, so even if an article contains no errors when citing other sources, the true error rate of the article, in terms of its match with reality, may be quite a bit higher than 0%.

Quoting secondary sources

As of 2017, I'm trying to read the full text of most of the articles I cite (i.e., hyperlink to), though I wouldn't extend this policy to entire books. However, I make an exception for citations within the text of a secondary source. I'm often ok with quoting from a Wikipedia article that has references to primary-source articles, or quoting from the introduction of a scientific paper that refers to other papers. The reasons are

  1. Practicality: Tracing studies backwards would take an enormous amount of time.
  2. Trusting the secondary source: I hope that the secondary-source author already correctly interpreted and paraphrased the primary-source articles being cited.

Unfortunately, the assumption that the original, primary sources are correctly interpreted by secondary sources is not always true, even for peer-reviewed academic publications, as Rekdal (2014) documents.

Rekdal (2014) describes various good and bad ways of citing information first found from secondary sources. I often use the third (pp. 641-42) of the options he describes: Namely, being explicit that you got the information about the primary source from the secondary source. For example, suppose James (2002) includes the following sentence: "Ice is cold (Smith 1981)." In my own piece, I would tell the reader about this as follows: "James (2002) says 'Ice is cold (Smith 1981).'" This is superior to Rekdal (2014)'s fourth option of "citation plagiarism" (p. 642), where you assume that James (2002)'s interpretation was correct and paraphrase what James (2002) said about Smith (1981), without consulting Smith (1981) itself. For instance, citation plagiarism might look like this: "The solid form of water has a low temperature (Smith 1981)."

Rekdal (2014) says (pp. 641-42) that unless the primary source is difficult to obtain, the third option (my typical approach) "could reflect a case of academic laziness, but coupled with utmost honesty." Of course, one man's laziness is another man's efficiency. I agree that for articles where accuracy is paramount, thoroughly checking the primary sources is probably worth the effort. But in more applied situations where quantity as well as quality of information is important, my approach of collecting a bunch of relevant quotes without thoroughly vetting them may make more sense. I doubt you'll find many business leaders or hedge-fund traders carefully checking original primary-source articles, except in special cases.

My approach of quoting a secondary source with its citations attached has the virtue of making it explicit that I'm trusting the secondary source's paraphrase, which hopefully helps to reduce the pernicious effects of citation plagiarism:

“The worst, in my opinion, is citation plagiarism: referring to a primary source you have copied from a secondary one, without consulting the former,” Rekdal said. “You then turn one single observation, interpretation or misinterpretation into two apparently independent ones, mutually reinforcing each other in a way that is entirely undeserved.”

Extensive quoting of secondary sources is a luxury that I can afford because I don't submit my writings for formal publication, where such quotes would be frowned upon.

In a discussion of the question "Is it unethical to cite a paper or book that you have never looked at?", Jason opines: "If you are completely honest about exactly the level of work you have done to verify that the cited source corroborates your fact, you are in danger of being proven wrong, or of being dismissed as not having done the appropriate level of verification, but you have behaved ethically." Another reply by jakebeal says: "When you cite a source, you are not actually claiming that you have read it. What you are actually doing is staking your professional reputation on that source containing the information that you claim that it contains."