Comments on Some Famous Cognitive-Science Papers

By Brian Tomasik

Written: spring 2009. uploaded: 9 Nov 2014.

Introduction

In spring 2009, I took "Introduction to Cognitive Science" at Swarthmore College. Most of the homework consisted of readings on which we had to comment. This page lists many of the readings and my replies, with a few omissions. I haven't been able to trace down the original article for some of the responses. I omit the full text of the assignment prompts because I don't have permission to reproduce them and in order to reduce the chance of current students copying these answers.

My views on cognitive science have changed in important ways since I took this course, so I don't necessarily now endorse what I said here.

Contents

AI

Article: Newell & Simon (1961), "Computer Simulation of Human Thinking": give comments on the General Problem Solver (GPS).

The GPS has three main goals and a few methods for trying to achieve them. For instance, given the task of transforming a to b, the machine might, following Method 1', come up with similar objects a'' and b'', figure out how to transform them, and then apply a similar process to a and b; this models reasoning by analogy. The machine also might, following Method 1, try to turn a into some c that's closer to b than a was; this models reasoning by intermediate goals.

Is the theory adequate? Well, it seems to do a pretty good job of covering some of the major ways that people solve symbol-manipulation problems. One aspect of human reasoning that it doesn't cover is "intuition" about what the answer should be. For instance, maybe humans are built in such a way that, given a desired transformation, they can see automatically how to achieve it because this answer is "hard-coded" into their brains. Or maybe they've seen the answer before and have it cached away in their memory. In principle, a computer can exploit these techniques too (by storing a lookup table of known answers), but the Newell & Simon system didn't do so.

Another area of human reasoning that's much harder to model is mistakes. It's easy enough to design a computer that will correctly carry out a computation that a human can also do; it's much harder to design a computer that models human thought processes so exactly that it makes the same errors humans do!

As far as describing human reasoning in general, the GPS is obviously inadequate, if only because it can't actually carry out the types of thoughts that humans can in other domains. In situations other than "try to reduce the difference between a and b" or "try to turn a into b," different heuristics would probably be needed.

Article: Haugeland (1981), "Semantic Engines: An Introduction to Mind Design".

Haugeland defines a formal system as a "game" with tokens, a starting configuration, and rules to follow; examples include chess and manipulation of algebraic equations. Turing machines are devices that can manipulate symbols according to the rules of any given formal system. This is relevant to cognitive science to the extent that psychologists may want to regard human thought as implementing some sort of systematic, algorithmic process (under the assumption that "Reasoning is but reckoning," as the introductory quote suggests).

Tokens lead syntactical lives as things that are manipulated according to the rules of a formal game, and they can be anything—pegs, marbles, helicopters. Their semantic lives are what they are taken to represent (e.g., a token in an algebra game is taken to represent a number of units of a physical quantity). The nice thing about computers, Haugeland says, is that they're what Dennett calls "semantic engines": given true input, they follow formal rules that preserve semantic truth.

Artificial neural networks

Articles: Steven Pinker, neural networks. M&R, article on PDP.

As usual, I enjoyed the Pinker reading a lot. He has a knack for making very clear ideas that would otherwise be more confusing. (However, I was familiar with neural networks prior to reading it, so I'm not sure how fairly I can assess that in this case.)

I think Pinker's main point was summed up well in this sentence (p. 128): "Our rule systems couch knowledge in compositional, quantified, recursive propositions, and collections of these propositions interlock to form modules or intuitive theories about particular domains of experience, such as kinship, intuitive science, intuitive psychology, number, language, and law." Pinker gave some rather convincing examples that not all human reasoning is fuzzy; there are cases (like sorites or drinking ages) where people do draw sharp distinctions and reason as though things are completely true or false, not merely highly probable.

Indeed, I think people do this too much: People too often feel that they either "believe X" or "don't believe X" and aren't willing to tolerate doubt. I think Bayesian reasoning is correct (because of, for instance, Dutch-book arguments), but I don't think it's actually the way people reason all the time.

Both Pinker and the MRH reading mentioned two advantages of neural networks that I found intriguing. (1) Simultaneous constraint satisfaction: Solving the chicken-and-egg problem of not knowing what constraints to impose until you know the answer, but also not knowing the answer until you impose constraints. This reminded me of the general statistical concept of the EM algorithm used to model missing data or latent variables. (I would guess that the neural networks described actually implement the EM algorithm in some form, though I don't know the details.) This is definitely something humans can do, as the examples of partially covered letters in both readings pointed out. (2) Content-addressable memory. The MRH example of Sharks and Jets made this very clear: Knowing some properties of an object activates its other properties, causing similar items also to be brought to mind. This does seem a rather plausible model of how the associative aspect of memory works.

I was interested to hear Pinker mention (p. 122) that Geoffrey Hinton had developed an approach to handle sentence concepts using an extra hidden layer. One of Hinton's latest projects deals with "deep-belief networks" with 4+ hidden layers, which give them more ability to model highly non-linear functions in a way that shallow "template matchers" like support vector machines cannot. Pinker made similar comments toward the end of his chapter, remarking on the fact that "raw connectoplasm is so underpowered" (p. 130); but I think Pinker goes further and argues that just adding more layers isn't enough.

Biases

Article: Heuristics and biases reading.

I really enjoyed the article, because I find this topic fascinating. In fact, one of my favorite blogs is called "Overcoming Bias," and many of its early posts were explicitly about the results of studies by Tversky, Kahneman, and others in this field.

Heuristics, and their attendant biases, make a lot of evolutionary sense. It's probably easier to select for organisms that execute simple decision procedures that happen to work most of the time than it is to design the full computational apparatus of exact Bayesian inference. Not to mention, computation is generally faster with simple rules. Indeed, that's one of the reasons that some machine-learning researchers explicitly prefer rule-based systems even though they know all about Bayes' Theorem.

I think overconfidence is the single most important bias people should be aware of. There are so many day-to-day situations where I see it pop up for myself, such as in assessing a subjective probability that things will go wrong, to determine whether I need to make backup plans. It's also extremely important at a theoretical level for assessing things like "what probability should I assign to the many-worlds interpretation of quantum mechanics?" or "what's my probability that fish can feel pain?" It's very easy to read a convincing argument, or a book or article that presents persuasive evidence, and be unable to imagine that a particular conclusion is wrong. However, this is often just because I haven't read all of the evidence available, or perhaps because no one has yet discovered the relevant evidence. (Nassim Taleb calls these situations "black swans.")

Neglect of prior probabilities is another important bias. As a friend mentioned to me, it's the reason that it can sometimes be harmful for people to do medical self-diagnosis online: They tend to judge their illness based on the likelihood term (what's the probability of my symptoms given this disease) without regard to the prior (what's the frequency of this disease in the general population)? Still, in some of the experimental examples, I do wonder whether the observed responses were partially due to miscommunication: For instance, when people tell you that Steve is shy and detail-oriented, they usually do so for a reason. There are implicit rules of speech (Grice's maxims of quantity and relevance) that dictate saying only what is necessary and important to a given situation, so the implication might be "I'm being told about Steve's librarian-like characteristics because Steve is indeed a librarian." The same is true to an even greater extent, I think, in the case of the bank teller named Linda.

Concepts

Article: Armstrong et al: theory of concepts.

Identification of family relations is a good example that Armstrong et al. present of the conflict between the definitional and prototype views. What makes someone a grandmother? On the one hand, we tend to see "kindly grey haired elderly female[s]" who make "chicken soup" (p. 294), and the more a woman possesses these properties, the more grandmotherly she seems. This is consistent with the prototype view, which pictures a quintessential grandmother with all these properties and assesses grandmotherliness in terms of distance along each of these dimensions. On the other hand, people know that a grandmother is really just a "mother of a parent," and in technical terms, none of the other properties matters. To the extent people agree with this, the definitional view is favored.

I think Armstrong et al. are basically right in concluding that people have both types of identification systems at work. While such a hypothesis is more complicated and so less desirable a priori than the supposition that people have only one or the other system, I think the evidence requires it; even the simple grandmother example suggests it strongly.

I wasn't surprised by the finding that people judge 8 as "more of an even number" than 18 or triangles as "more plane geometric" than trapezoids. If asked, I would probably say the same. As the authors point out, this presents a fundamental problem for studies that attempt to use such judgments to probe the nature of concepts—since people make fuzzy judgments even about categories that they admit (Experiment III) are not fuzzy.

The feature theory of concepts was quite familiar to me from machine learning: Most machine-learning problems are solved by extracting a set of features from an item, representing items as vectors in the resulting high-dimensional feature space, and then applying a classifier or (in the case of unsupervised learning), identifying clusters (i.e., "concepts") in feature space. (Of course, this feature approach inherently loses information about the original items—there aren't enough dimensions to describe everything about the original object.) In this setting, the features are pretty well defined (often just real numbers), so that we don't usually run into the fundamental philosophical problems that Armstrong et al. discuss starting on p. 297 about what exactly a feature is.

Language

Articles: Anderson, Doctor Dolittle's Delusion.

Terrace et al. (1979), "Can an Ape Create a Sentence?".

I learned a lot from Anderson's piece and enjoyed the many pictorial examples he gave, though I was slightly put off by his politically correct effort to defend sign language in a condescending way (referring, for instance, to the "(mis)impressions" of the "untutored eye"). I was interested by the fact that ASL, even though it spun off of ordinary English, has developed its own set of particular conventions. An example are the morphological rules discussed on p. 249, such as using an index finger to indicate "you, they" and a flat hand for "your, their." It must be that these sorts of rules, which arose out of nowhere in ASL just as in other human languages, give the listener (or watcher, in this case) hints about what's being said. I've noticed similar patterns even in programming languages, where styles and idioms develop that aren't required by the language compiler but help to guide reader expectations.

A similar comment can be made about abbreviations and colloquial forms. ASL developed a colloquial form for "house" (p. 240) that's easier to sign quickly, and "red slice" has become its own "tomato" symbol because, presumably, that's a quicker way to abbreviate such a common word. The same happens in programming languages, where commands that have to be typed often are eventually come to be given very short names.

I was surprised to learn that signs in ASL using two hands must be symmetric or else have one hand stationary (p. 246), because it seems one could attain much more expressive power by including signs in which each hand does something different. I wonder if the restriction arose because it's hard to move two hands at once in different ways (the "pat your head and rub your stomach" phenomenon)? Or maybe it makes it easier to understand both left- and right-handed signers, as discussed on p. 260?

Terrace et al. discussed a number of problems with the evidence that has been cited to argue for Nim's sentence-producing ability. One of them was that a number of the interesting artifacts of his communication could have resulted from chance. For instance, his seemingly novel phrase "water bird" could have, with 50% probability, been a random combination of the separate labels "water" and "bird" (p. 895), and over the course of lots of utterances, such coincidences might easily occur. The authors make a similar comment regarding Nim's apparent tendency to express recurrences like "more" in the first position a greater amount than in the second position, because in fact, "more" was the only recurrence used (p. 896). In addition, Terrace et al. point out the ambiguity in the semantic interpretation—as agent-object, beneficiary-object, or possessor-possessed object—of statements like "Nim banana."

My main question was: "Yes, these are certainly problems, but don't they show up just as much in the analysis of language abilities in human children of various ages? Are there ways people overcome these limitations in that context?" On the other hand, we know human children eventually develop full-blown language, so maybe we need less skepticism in their case.

Article: linguistic diversity.

One of the ways in which, the article suggested, language may impact cognition is through rules on word use for different types of objects. For instance, p. 218 explained how number words in Cantonese must be accompanied by clarifiers like "faai" and "gau" that describe the shape of the object. Table 7.1 on p. 221, showing different rules for counting objects, people, and animals in the Squamish language reminded me of the categories in "20 Questions" (animal, vegetable, or mineral). How much do linguistic rules like this affect thought? I'm not sure. Certainly they reflect upon what the culture that originated them found important (as is seen, e.g., in the fact that Nivkh has a different number words for items like batches of dried fish, skis, and boats), and perhaps they reinforce those categories as separate within the minds of the speakers (in the same way that dividing things into "animal" and "plant" at the start of 20 Questions may reinforce the notion that flora and fauna are widely separate types of things). It's also possible that the speakers just gloss over the categories without thinking about what they mean (in the way that, e.g., people usually gloss over the meaning of "mineral" when they assign an object to that category during 20 Questions—except perhaps when they're first learning the rules of the game and are confused about the fact that "mineral" really means "everything else besides plants and animals").

I was intrigued by the Tabasaran rules shown in Figure 7.7 (p. 225). Because that system encodes lots of information in a small amount of space using combinations of symbols at specific locations, it reminded me very much of computer data storage—e.g., an MP3 file header, which has specific regions of a fixed number of bits for encoding specific pieces of information.

The Tabasaran rules themselves would be quite amenable to automatic computer decoding. Perhaps natural language processing would be a much easier task if everyone spoke Tabasaran!

Article: Steven Pinker, The Language Instinct, Ch. 9 ("Baby Born Talking—Describes Heaven").

One of the challenges to language learning that Pinker cites is that the problem, like vision, is ill-posed: There are infinitely many possible rules that could give rise to the same finite set of observed utterances. As Pinker says on p. 293, "logically speaking, an inflection could depend on whether the third word in the sentence referred to a reddish or bluish object, whether the last word was long or short, whether the sentence was being uttered indoors or outdoors, and billions of other fruitless possibilities that a grammatically unfettered child would have to test for." To some extent, I suspect children use Occam's razor to solve these issues: Most logically possible rules don't apply, and it makes sense to start by assuming there is no special rule until the evidence suggests otherwise. But there are certainly cases where special rules have to be learned—as, e.g., the Harrison reading illustrated, with its examples of words that applied only to certain shapes or classes of objects. I suppose Pinker would suggest that evolution has disposed babies to be more expectant of rules involving certain types of distinctions (e.g., people vs. animals vs. plants) than others (e.g., whether the sentence is said indoors or outdoors).

I find it hard to imagine exactly what it looks like for linguistic predispositions to be encoded genetically. How do you manipulate protein expression in such a way as to cause babies to expect particular types of grammatical transformations? And yet, we know that such feats are possible, because other obviously genetic instincts can sometimes be similarly hard to visualize (how do you manipulate gene expression in order to cause a baby bird to open its mouth waiting for food? or is this learned rather than instinct?).

As far as the evolution of language, George Williams's suggestion on pp. 294-95 (that language allows parents to tell their children about danger) sounds plausible. In general, though, language seems to me like a hard thing to evolve, because it requires several people simultaneously to share the trait enough for them to benefit from using it with one another. (The exception is if language arose as an accidental byproduct of another trait, like large brains, that was selected for on an individual level.)

Articles: Whorf, and Pullum's reply.

I think I agree somewhat with both Whorf and Pullum. Whorf is obviously correct that language is a background assumption that pervades our thought without necessarily being consciously appreciated, in the way that the color blue would be an unrecognized part of the vision of the hypothetical only-blue-seers. This probably does shape (or perhaps just reflect?) the ways that we think, reason about problems, etc. But exactly how these influences occur seems really hard to tease apart. I share Pullum's skepticism about creating just-so stories relating some linguistic feature (e.g., four distinct roots related to "snow" in the Eskimo language) to some particular aspect of culture (e.g., the supposedly obvious fact that snow is a vital part of Eskimo life). A similarly tenuous example is the claim that the timelessness of the Hopi language would be reflected in their culture and physics. Pullam's parody on p. 279 is instructive: "It is quite obvious that in the culture of printers...fonts are of great enough importance to split up the conceptual sphere that corresponds to one word and one thought among non-printers into several distinct classes..." As he observes, this statement is "utterly boring" and shows that having a diversity of terms to describe specific concepts needn't be very significant to a culture.

I appreciated Pullum's effort to correct a misconception, though his righteous indignation was a little off-putting. In particular, I was quite skeptical of his claim that "The prevalence of the great Eskimo snow hoax is testimony to the falling standards in academia [...]" (p. 279). I'm sure there were plenty of hoaxes in times past as well. Pullum makes no effort to defend this charge of "falling standards"; the statement just makes him sound curmudgeonly.

Article: Noam Chomsky in "A Companion to the Philosophy of Mind".

One of Chomsky's themes was to point out that many of the supposed "difficult problems" that language raises are simply ill-posed and not problems at all. As he says on p. 154, "The belief that there was a problem to resolve, beyond the normal ones, reflects an unwarranted departure from naturalism [...]." Examples include such puzzles as whether Peter and Mary speak the same language, or whether a computer really understands Chinese. These are "like asking whether Boston is near New York" (p. 163). With all of this I agree. However, as I've noted before, I don't think it dismisses the concerns of Searle et al. as to whether computers are conscious, because consciousness, unlike "true understanding," is a concrete thing that exists in the world. [Nov. 2014: For my updated views on this, see "Dissolving Confusion about Consciousness".] Indeed, as Descartes would say, I'm more certain that my consciousness exists than I am that the chair I'm sitting in exists.

Chomsky does address the question of "How can organized matter have these properties?" (p. 157), suggesting that the problem may just be "cognitively inaccessible" to us (p. 157)—this may be one of the "mysteries that will be forever beyond our cognitive reach" (p. 156), like a rat trying to solve a "prime number maze."

As far as the applicability of linguistics to cognitive science generally, Chomsky seemed pretty doubtful, maintaining that language parts of the brain needn't reflect at all upon other parts of the brain: "As far as we know, there are no 'mechanisms of general intelligence' [...]. And if that turns out to be the case, there will be no serious field of 'cognitive science'" (p. 162). So much for this course, I guess. :)

On p. 161, Chomsky mentions the "principles and parameters" model according to which "Language acquisition is the process of determining the values of parameters" within a very general grammatical framework. I would point out that, in general, the fact that something can be described by a particular parametric model doesn't mean that it's actually produced by that model. For instance, any periodic function satisfying certain smoothness conditions can be described as a sum of sine and cosine waves with certain coefficients, but that doesn't mean the function actually arose that way. Or take the von Neumann-Morgenstern expected utility theorem from economics: Any preference ordering over lotteries satisfying certain axioms can be described as maximizing the expected value of some utility function, but that doesn't mean people actually make choices that way. Still, the fact that Chomsky's parametric model appears to have a small number of parameters is good. (Large numbers of parameters, as statisticians and physicists know very well, can fit just about anything.)

The last paragraph of the article made an interesting offhand remark: that "the creative aspect of language use" is "the best evidence for the existence of other minds for the Cartesians." I'm not sure this is what Chomsky meant, but I interpreted his point as saying that other minds are likely to exist because if other people were just in my imagination, how would I be surprised by the things they say? I don't think I agree with this reasoning. Why can't the Cartesian demon just hide those novel statements from my conscious awareness until the right time? Indeed, I'm often surprised by what other people say in my dreams.

Mandarin time

Article: how languages describe time.

In general, I thought the study was pretty well done. I'm mildly skeptical of its conclusions, but that's mainly because I'm somewhat skeptical about making specific inferences from psychology studies of this nature in general, just because the variables at play are so fuzzy and the potential for lurking variables is so great. (By this I imply no offense to the field of psychology, which does great work. Psychologists simply have it harder than, say, physicists, who have the luxury of cleaner problems and more precise measurements.)

One of the weaker points of the study was that it didn't explain (at least as far as I can recall) why English speakers and Mandarin speakers didn't show the same sorts of trends with respect to primes. For English speakers, there was no significant interaction of prime with target: English speakers always found the judgments easier after a horizontal prime, presumably, so the study claims, because horizontal conceptions of time are natural to English speakers. If that's the hypothesis, then the prediction should be that Mandarin speakers will uniformly be faster with vertical primes, because of "the preponderance of vertical time metaphors" in that language. But instead, Mandarin speakers were faster with whatever prime they were given; their response times seemed to adapt to the primes much more. Of course, it's possible to come up with post-hoc explanations of why this would be so: Maybe the fact that Mandarin has several horizontal time words means that Mandarin speakers will be more flexible in their ways of thinking and so more open to priming. Or maybe the fact that they're bilingual has that effect. (To test the latter, one could do the same study all in Mandarin, with people who only spoke Mandarin vs. native English speakers who learned Mandarin as a second language.)

...Upon rereading the introduction, I see that the authors did address this point after all: On p. 5, they say, "In summary, both Mandarin and English speakers use horizontal terms to talk about time. In addition, Mandarin speakers commonly use the vertical terms shang and xia." So I guess the claim is that Mandarin speakers should be more flexible in their priming, in which case the results do validate the hypothesis.

Memory

Article: neural mechanisms of memory.

As far as neural mechanisms of memory formation, the article focused on Hebbian learning (reinforcement of synapses that are used), particularly long-term potentiation (LTP) as well as even longer-term morphological changes (e.g., increase in the number of synapses or synaptic zones).

As with the Stillings chapter on Tuesday, I liked this reading, because it presented a lot of basic information and study results in a compact format. (This is a feature of textbook reading in general, I suppose. I find it somewhat unfortunate that many Swarthmore classes emphasize in-depth study of particular articles and topics over broader textbook-style learning, because the latter really is an efficient way to learn a lot.) The level of the material was generally appropriate, though I found some of the discussion of neural chemistry and brain-region geography slightly more detailed than necessary.

I was particularly interested by the theories of memory consolidation in the last section. The analogy to computer science was striking: Computers also have "working memory" (RAM) and "long-term memory" (hard disk), and computers also use "pointers" to access larger chunks of memory stored elsewhere. I think computers use the standard model for consolidating memory: That is, computers write their pointers to disk, so that one portion of the hard disk refers to another, and the pointers stored in memory are no longer needed. Normal RAM is volatile and so must work this way (since it disappears when the power is turned off), but I wonder if non-volatile RAM (e.g., flash memory) could be used to implement the multiple-trace approach. Whether this would even make practical sense, I'm not sure—for instance, why would there need to be multiple pointers to the same region of hard disk, unless frequent failures were expected?

Meno (by Plato)

Articles: Socrates, Locke.

Socrates claims that humans have immortal souls and that when we learn about the world, we are merely stirring up knowledge that our eternal souls have always known but forgotten. He makes the argument by asking questions to an uneducated slave boy that lead the boy to discover how to double the area of a square. By assuming that knowledge of this sort (how to double a square) must be either (a) taught or (b) previously known, Socrates concludes that the boy must already have known this fact, in a previous life.

Locke takes essentially the polar opposite view. It's not that people know everything and merely have forgotten it; rather, humans start out knowing nothing (the "blank slate" concept or, as he calls it on p. 1, "white paper"). They acquire understanding from two sources: "[1] Our observation employed either, about external sensible objects, or [2] about the internal operations of our minds perceived and reflected on by ourselves [...]" (p. 1). Locke claims that any thought a person has can be traced back to some combination of sense perceptions and understanding of how our mind works (though he doesn't spell out the details). He suggests that children seem to start out without certain concepts that they develop as they get older, so that--presumably--if they were to be shut up in the dark for their childhood, they wouldn't acquire those concepts. Locke also suggests that babies may sleep a lot because they don't have many ideas in their heads yet.

I don't much like Socrates's view, because I feel as though it doesn't really solve the problem. Sure, maybe our souls already knew things and have since forgotten, but how did the souls learn that material in the first place? The problem is just pushed back. I suppose it could be "turtles all the way down," i.e., our eternal souls have always known these things and never had to learn them initially (like cosmological models in which the universe has always existed and so never needed to be created), but this feels like an unnecessarily complex explanation. The flaw in Socrates's argument is his false dilemma that everything known must either be taught explicitly or recollected. This ignores the possibility that a person might teach herself (by reasoning with existing knowledge, and possibly with the help of Socratic questioning).

Locke's view seems intuitively plausible (though somewhat at odds with current knowledge of genetics and neuroscience), though I would like to see more concrete details of exactly how sensation and reflection work.

Numeracy

I was quite interested by Dehaene's neural-network model for number recognition (pp. 32-33). Not only does it apparently work, but it seems to correspond nicely to observations in cats by Thompson et al. (1970), and it explains nicely why we can immediately recognize small numbers of objects without having to count each item.

Dehaene's accumulator metaphor seemed very plausible and explains both the distance and magnitude effects nicely (p. 30). The model is especially intuitive because I subjectively feel as though something similar is going on in my brain when, say, I'm thinking about the passage of time; I feel sort of a fuzzy, continuous value for roughly how long it's been since a past event. Indeed, I wonder if the systems are related (at least for rats), because methamphetamine increases subjective clock speed and also increased the number of sounds rats thought they heard (p. 31). Babies, too, seemed to show accumulator-like numerical assessment in the study, reported in Science News, in which they looked longer at displays alternating between 8 and 16 dots but not 8 to 12 or 16 to 24. (This is basically just the distance and magnitude effects again.)

I found it interesting that scientists seem to take the approach of assuming the absolute minimum possible in terms of quantitative skill, only granting more advanced capabilities where the evidence is unmistakable. This is true both for animals and human babies (hence the controversy in the Science News article), which is good, because it means scientists aren't just prejudiced against animals relative to humans. If I were to give an informal guess about what numerical abilities animals have, I would probably be more generous than scientists are willing to be—especially in the case of human infants, since we know that humans eventually acquire these skills, so it's not hard to imagine that they do so at an early age. But assuming the least ability until proven otherwise is probably a good methodological assumption, similar to assuming the null hypothesis until proven otherwise, inasmuch as it reduces the rate of falsely attributing abilities to animals where none actually exist.

Reductionism

Article: Williams 1997, "Biologists Cut Reductionist Approach Down to Size".

The article referred to Steven Weinberg's phrase "grand reductionism" as "the idea that the most fundamental layer of nature holds an explanation for all the features of the outer, higher layers." I think it's important to distinguish between (1) reductionism in theory—that higher layers can in principle be explained by lower layers—and (2) reductionism in practice—that the best way to study higher layers is to pick apart the lower layers. (1) is obviously true; for instance, biology must in principle be reducible to physics, or else it would defy the physical laws of the universe. But due to computational and measurement constraints, it's definitely not feasible to study transmission of DNA by modeling its quantum-level molecular behavior. For every domain, there's a level of detail that's most helpful to work at, and it's not always the lowest one. For instance, even in computer science, where everything could be traced to the level of transistors in complete detail and without any unknown variables, very few people do so, and in fact a great effort is made deliberately to abstract away complexity.

Article: Weisberg et al. 2008, "The Seductive Allure of Neuroscience Explanations".

PET Normal brainThe authors asked novices and experts to assess the quality of explanations of psychological phenomena. When bad explanations were spiced up with references to neuroscience terms, novices judged the bad explanations as being much better than the bad explanations without such terms (while they judged good explanations with such terms only somewhat better than those without). Experts were generally not fooled by the meaningless neuroscience material. The authors suggest that one reason for the observed result could be reductionism—the idea being that when novices hear mention of low-level parts of a system, they assume that those parts must contribute to a higher-level explanation, even if there is no such higher-level explanation.

Another idea, which I don't think the authors mentioned explicitly, is that when someone talks about low-level neuroscience, she tends to "sounds smarter," so a lay person may be more likely to assume that she knows what she's talking about. The same sort of thing happens all the time with, say, mathematical symbols or quantum physics. When someone uses such terms, lots of people tune out, thinking "I have no idea what that means, but anyone who uses that language must be really smart," even if what's said is complete nonsense.

Finally, I would guess that some people simply don't understand what it means for an explanation to be good: That an explanation of A by B must say why B leads to A. People might assume that a "good explanation" is just one that mentions lots of details, or some mysterious concept (what some have called "semantic stopsigns").

Robotics

Rodney A. Brooks, "Intelligence without representation".

I agree with Brooks's premise, illustrated in the parable with the airplane, that if the goal is to achieve real AI (not just highly specific programs that people call "AI" because it sounds cool), then it's important to evaluate fully working systems at a crude level, rather than designing detailed ornaments to decorate an architecture that ends up being flawed. I suppose a fully working system might include robotics components, but I don't understand why Brooks takes robotics to be the fundamental defining characteristic of an intelligent system (rather than, say, language understanding as the Turing test suggests, or some other ability entirely). There are lots of problems that people would like an AI to solve that have nothing to do with embodiment in the physical world, so to me, Brooks's goal of producing insect-level behavior in robots is just one component that will need to be incorporated with others (e.g., language)—which is precisely what Brooks is skeptical of.

Brooks makes good points about the value of starting with lower layers that don't depend on higher layers or on each other, to make debugging easier and reduce the scope of damage if one component fails. This is hardly an original idea: It's just a statement of the basic software-engineering principle of modular design. But there are different degrees to which it can be used, and Brooks is of the opinion that it should be applied more aggressively.

Moreover, his thesis that the real world should be used as the principal form of representation between components is less obvious. I guess I can see this approach working well in robotics, but what about more abstract problems like vision and speech? And of course, not using internal representations presumably comes at a computational cost: Each component has to compute its own information from the world all over again. (But I guess Brooks would say this is a price worth paying to ensure independence.)

The approach of having "no central representation" whatsoever seems more extreme than necessary. If your goal is to make sure that certain basic operations (e.g., avoiding walls) aren't disrupted by failures higher up, why not just make the "don't walk into walls" module robust against commands from higher layers, so that not walking into walls is the default action unless there's strong overriding evidence. In other words, there's a continuum of how much central control is exercised, and Brooks is sitting on one extreme end by not having any. Why not have just a little bit?

Spatial navigation

Articles:

The combined evidence that von Frisch and Gallistel presented made a convincing case that bees must have an integrated cognitive map. They don't just rely on scent (because, e.g., in von Frisch's Fig.-4 experiment, all of the plates were scented, but the bees only went to the one with food). They also don't just rely on ant-like dead reckoning because, as Gallistel p. 44-45 describes, they find their way back to their hives too quickly to be using a systematic search, and they do worse when the territory is unfamiliar. And they don't just use "strip maps" because they're able to chart the right course even when they're captured and taken off their planned route (p. 42).

I am curious what Brooks would say about all of this, because I don't see how his subsumption architecture could give rise to this sort of behavior, without the full abstraction of a cognitive map. Bees apply a sophisticated array of tools (angles, distances, polarized light, familiar landmarks, passage of time) under all of the right circumstances, falling back on the exact appropriate alternative cues when some of them are tampered with by experimenters. Could this be engineered in a subsumption style? For instance, the angle module would stop and defer to the sun-has-moved-in-the-sky module when the bee has been delayed and needs to adjust its angles?

By the way, I'd be curious to know the status of the scent theory that von Frisch describes at the beginning of his lecture. The evidence for bee specialization by scent seemed pretty persuasive. So is the current thesis that bees use both scent and spatial maps?

Turing test and Chinese room

Article: Turing (1950), "Computing machinery and intelligence".

Turing's test proposal was actually somewhat different from what's standardly called the "Turing test". In his proposal, player C would try to determine the gender of players A and B, with player A (the computer) trying to fool C and B trying to help C. (I've heard it speculated that Turing was interested in this game because of his homosexuality, but who knows.)

Turing's test seems a generally good way to evaluate whether computers exhibit human-level abilities; my main complaint is that the question isn't really that interesting to me, in comparison to the question of whether computers can have conscious mental states similar to those of humans and advanced animals. (I mentioned last semester when I inquired about this course that this latter question is of particular interest to me, because it's highly relevant to ethical implications of creating strong AI.)

Perhaps consistent with the behaviorist era in which Turing wrote, Turing's test concerns only externally verifiable actions. Indeed, he admits in sec. 2, "May not machines carry out something which ought to be described as thinking but which is very different from what a man does?" He goes on to say basically that he doesn't care: "we need not be troubled by this objection." In responding to objection 4, Turing accuses his critics of solipsism. However, I think there's a difference between real solipsism (doubting whether anyone other than yourself is conscious) and doubting whether machines exhibiting behavior similar to humans but with very different hardware are conscious in the way humans are. Identical behavior is not the only relevant consideration. For instance, both snakes and flowers sometimes move toward the sun as it goes across the sky during the day, but there's reason to think that snakes may be conscious while flowers aren't (namely, the neural hardware of snakes is more similar to our own). Searle argues this point further in his paper.

Article: Searle (1980), "Minds, brains, and programs".

I really like the Chinese-Room argument; in fact, Searle's paper is the one I was most looking forward to reading this semester, because I've read a number of secondary-source commentaries on it and have had several arguments with others about it. As with the Turing test, I'm personally more interested in whether abstract Turing machines can be conscious (I'm less interested in the vague question of "whether they really understand"), but the thought experiment works well in either case.

I agree with the criticism that some of my friends have leveled against Searle: That his paper isn't really an argument but merely an assertion of his position that blind manipulation of symbols doesn't produce true understanding. The claim that the Searle-in-the-room system doesn't understand the Chinese input simply begs the question, they say. This is true enough, but the point of the paper is to serve as an "intuition pump," in the words of Daniel Dennett, just like any thought experiment. Dennett accuses Searle of using the scenario to hide relevant aspects of the computational process, but I think the opposite: Searle's description of symbol manipulation makes it clear exactly what's going on inside computers, which normally seem like electronic black boxes whose inner workings are mysterious.

I liked Searle's discussion of syntax vs. semantics. It seems intuitively clear that conscious understanding requires more than syntax, because the same syntactic operations can have completely different semantic content depending on how they're used. (As Searle points out, my internal organs are doing syntactically meaningful operations right now.) It doesn't seem impossible to me that there's some sort of "consciousness stuff" that has to be physically present, in the same way that there's fundamental "charged-particle stuff" that has to be present to generate electric fields. I agree with Searle that this doesn't rule out conscious minds made of things other than carbon-based living cells, but it does rule out conscious understanding arising from arbitrary operations that can be regarded as Turing machines.

[Nov. 2014: My views on this matter have changed significantly. For instance, see my discussion of the Chinese room in "Flavors of Consciousness As Flavors of Computation".]

Vision

Article: R. L. Gregory, "Perceptions as Hypotheses" (1980).

Prior to this, I hadn't ever thought much about exactly what perceptions might be. But after reading the article, I agree that "hypotheses" is a good characterization. Perceptions are interpretations of inputs from our eyes, ears, etc., and just like scientific theories, they do indeed turn raw number-like values into something more abstract ("this is an edge," "that face is convex rather than concave," etc.). The only real deficiency that I can see in this description (and which Gregory notes) is that perceptions are more than just factual hypotheses, but they also involve qualia (a feeling of what it's like to see a convex face, etc.). I can't blame him for not proposing how this works; the "hard problem" of consciousness is, after all, pretty hard.

The thesis of the article gave me a new perspective on optical illusions. I had previously thought of them as just fun curiosities, but the perception-as-hypotheses idea paints them as competing interpretations of the same data. It's interesting that the mind only ever has one interpretation at a time, because according to the Bayesian understanding of hypotheses (which I think is obviously the right one), there's no single right answer, but just a probability distribution over answers. If the brain is Bayesian, there must be a process that picks out the single most probable of the hypotheses for display to consciousness. (My general impression is that cognitive scientists assume that the brain is essentially Bayesian, but I'd be curious to know why. I think Bayesianism is the normatively best way to do statistical and scientific inference, but it doesn't follow that the brain has to use it, unless one makes the argument that natural selection will give brains the optimal approach.)

Article: D. Marr (1982): "Selections from Vision".

Marr's three levels (p. 248) are computational theory (a high-level description of what's being done), representation and algorithm (essentially, what software code to write), and hardware implementation. He finds this a better conceptual framework for approaching the problem of vision than merely working at the hardware level, that is, merely trying to point to specific neural structures that perform specific operations.

Article: Steven Pinker, How the Mind Works, Ch. 4: "The Mind's Eye".

I was interested by the discussion starting on p. 256 of the brain's 2.5-D representation. Again I was reminded of the parallel with computer science, where a computer might, e.g., read in data from a file and store it as an in-memory data structure (such as a binary tree or hash table) so that it can be easily accessed by other functions of the program. Page 257 hints at what computer scientists call the "curse of dimensionality": If the brain were to use a full 3D representation (a 3D array), the relevant values would be sparsely distributed and hard to interpret without lots of lookups and further processing. (Not to mention, the memory requirements would be vast.) Even a 2D grid seems pretty big to me, so I was surprised that Marr's 2.5-D sketch idea included a whole 2D grid (plus depth, slant, tilt, and surface at each point). However, I guess this isn't too bad, because the brain gets an entire 2D array from the eyes anyway. Higher-level abstractions can be done by further processing at a later stage, as when the brain recognizes objects by considering them as combinations of geons (according to Irv Biederman's hypothesis, at least).

The discussion of shape, shading, and lighting by analogy to sheet-metal workers, painters, and lighters was excellent. The example schedule of fees on pp. 251-52 reminded me of another computer-science concept called "minimum description length" (MDL) for model selection. The set-up is basically the same: Given various ways of describing some data, each with different costs (measured in bits rather than dollars), choose the one that gives the shortest description. These costs are in fact -log(probability), which means that MDL is equivalent to maximizing a Bayesian posterior probability (which Pinker himself hinted at).