aba events
Nikon Monarch 7

    Indeterminacy

    Practically all birders come to realize, sooner or later, that bird identification is an uncertain affair. And most birders, I think it’s fair to say, would tell you that there are two major causes of this uncertainty. Here they are:

    1. Any bird, if seen poorly or briefly, may be difficult or impossible to identify. You’re at the local landfill, and it’s getting on toward sundown. Ring-billed Gulls are swirling all over the place. One of the gulls catches your eye for being a bit darker and daintier than the others. You see the bird put down, and it looks good for Mew Gull—a rarity in your area. You train your scope on the bird and…it’s gone. Other gulls have settled down in front of it, and the bird is invisible. Honestly, you tell yourself, you don’t know what the bird was. The lighting was poor, after all. And you can’t have observed the bird for more than three or four seconds. You have to let it go. 

    2. In other instances, you get a great view of the bird—but you still can’t affix a name to it. In this case, you’ve had ample opportunity to study the bird. Your photos are tack-sharp. You’ve discussed the bird with other folks, both in the field and via the internet. But nobody’s willing to pull the trigger and attach a label to the bird. Nobody can say for certain that it was a pure Glaucous-winged Gull—very rare in your state. Maybe it was a hybrid with a Herring Gull. And if so, was it a straightforward “F1” hybrid (one Herring Gull parent, one Glaucous-winged Gull parent), or was it some weird backcross (now we’re talkin’ grandma and grandpa, and then some)? Again, you let it go.

    To sum up, we would seem to have two sources of uncertainty: (1) the observer; and (2) the bird itself. And that’s that. We’ve covered all our bases. End of story.

    Not quite.

    In both of the preceding scenarios, we’re making a particular assumption. At first blush, it’s an eminently reasonable assumption. Our assumption is that, regardless of what the bird really is, it most assuredly is something. In the first scenario, that bird seen briefly in the setting sun either was a rare Mew Gull or it was something else—most likely a very common Ring-billed Gull. In the second scenario, that tricky large gull either was a pure Glaucous-winged Gull or it wasn’t. If it wasn’t, then it was 50% Glaucous-winged (an “F1” hybrid) or 75% Glaucous-winged (an “F2” backcross) or conceivably 72.3187% Glaucous-winged (perfect knowledge of the bird’s entire genome courtesy of futuristic technology for DNA analysis). Regardless, it was one thing or another. It was either a pure (albeit aberrant) Glaucous-winged Gull, say, or it was an “F2” backcross. But it certainly couldn’t have been both of those things.

    GWGu x HerG January 2004
    This interesting gull visited Boulder County, Colorado, during January of 2004. Was it a pure but aberrant Glaucous-winged Gull? Or was it a hybrid? Photo by
    © Bill Schmoker.

    But what if our assumption has been wrong? Let go of your cherished notions of reality, for just a moment now, and consider the possibility that bird identification is better conceived in probabilistic terms than in familiar deterministic terms. Consider, then, a third scenario:

    3. You’re out birding in coastal California, and you see the bird depicted below. It’s a Thayer’s Gull, right? Just then, your birding buddy from Pennsylvania e-mails you a photo of a bird he’s got under observation. It looks just like your bird. But your buddy isn’t sure his bird is a Thayer’s Gull; he thinks it might be an Iceland. Just for fun, let’s say we somehow know that your bird in California and your buddy’s in Pennsylvania hatched from the same clutch. In fact, we even have knowledge, somehow, that they’re monozygotic; they’re “identical twins.” The two birds are identical. Despite this new knowledge, you stick with your initial ID, and your buddy sticks with his. What’s going on here? Or, to be blunt about it: Which one of you is wrong?

    Possible thayeri February 2007
    This gull was found in Allegheny County, Pennsylvania, during February of 2007. Thayer’s or Iceland? Now what if the photo were from California? Would you change your mind? Photo by © Geoff Malosh.

    I’m going to argue that you’re both right. Both IDs, seemingly mutually exclusive, are correct.

    Let’s proceed. You’re still in California, but now you’re looking at the bird depicted below. Looks pretty good for Iceland Gull. In fact, I’d say it looks great for Iceland Gull. Now, as many birders know, Iceland Gulls are variable. Let’s say “only” 99% of Iceland Gulls are as pale overall as this one, but 1% are darker. Fine. This bird would appear to be among the 99% of Iceland Gulls that are, for want of a better word, “normal.”

    Iceland Nebraska Dec 2004
    If a bird is an Iceland Gull, there is perhaps a 99% probability it will look as “good” (as “typical”) as the bird depicted here. Is that enough to make this particular bird an Iceland Gull? Photo by
    © Bill Schmoker.

    Enter Thayer’s Gull. Thayer’s Gulls are variable, too—notoriously so. But let’s be reasonable. Let’s say fully 98% of Thayer’s Gulls are darker than the bird depicted here. (And let me acknowledge here that there is more to distinguishing Thayer’s from Iceland than overall “paleness.” I know that. But it doesn’t affect the argument. We can say that 99% of Iceland Gulls show the suite of characters shown by this bird, whereas 98% of Thayer’s Gulls do not. But so as not to get bogged down in terminology, I’ll stick with the simple pale vs. dark dichotomy.)

    Anyhow, the case is looking awfully strong for Iceland Gull, you would think. We can’t be completely certain, of course, but I think most of us would lean pretty heavily toward Iceland. We won’t say it’s “definitely” or “positively” an Iceland Gull, but let’s say it’s “likely” or “probably” that species. I mean, 99% of Iceland Gulls look this way, and 98% of Thayer’s Gulls do not. Let’s split the difference, and say this bird has a 98.5% chance of being an Iceland Gull.

    Not so fast.

    Iceland Gulls are much rarer in California than Thayer’s Gulls. I’m going to say that for every Iceland Gull in California, there are about a thousand Thayer’s Gulls. I think a lot of folks would say I’m being pretty generous in that assessment. Maybe it’s more like five thousand to one. But let’s stick with one in a thousand. In a sample of one thousand Thayer’s and Iceland Gulls in California, only one is an Iceland.

    I’ve now given you the three pieces of information that are necessary for determining the probability that this bird is an Iceland Gull. To recap: 99% of birds that look like this are Iceland Gulls; 98% of birds that don’t look like this are Thayer’s Gulls; and Iceland Gulls make up one-tenth of one percent of the California population of the Iceland and Thayer’s Gulls combined. Now what’s the probability that this bird is an Iceland Gull?

    The answer, surprisingly, is 4.7%. (For the mathematically inclined birder, I work out the numbers in a footnote at the end of this post.) That’s it. That’s all. There’s only a 4.7% probability that the bird is an Iceland Gull. Statisticians generally say that probabilities below 5% tell you you’re looking at something else. Statisticians call that something else an “alternative hypothesis.” We birders call that something else a Thayer’s Gull. This bird is a Thayer’s Gull.

    Unless you’re in Pennsylvania.

    In Pennsylvania, Iceland Gulls outnumber Thayer’s Gulls by at least ten to one. And that’s being quite generous. If you do the math for Pennsylvania, there’s a 99.8% chance the same bird is an Iceland Gull. The bird in Pennsylvania is 10,118 times (yes, ten thousand, and then some) more likely than the same bird in California to be an Iceland Gull.

    The only difference is the location of the photo. (If you’re a literalist, and you require two separate birds—one in California, one in Pennsylvania—then go back to my scenario of monozygotic twins. We have two birds now, identical to one another. The only difference is that one has strayed to California, whereas the other has wandered to Pennsylvania.)

    This result is stunning. I’ve known about it for quite some time, but it still amazes me.

    For starters, the result has consequences for how we identify birds. Let’s say we know from museum specimens that 4% of Species A’s outermost primaries are longer than 80 millimeters, whereas 20% of Species B’s outermost primaries are longer than 85 millimeters. Stop. Don’t go any further. We can’t do anything with those numbers until we know something about the relative probabilities of detecting Species A vs. Species B. Let’s say you’ve just netted a possible Alder Flycatcher in Utah. Now what? Well, you need two things. First, you need The Pyle Guide, with its heaps of data on wing formula and such. Second, you need to know the likelihood of detecting an Alder Flycatcher in Utah, as opposed to, say, New York.

    Alder FlycatcherAlder or Willow? The “classical” method for answering that question would be to conduct a morphometric analysis of the bird’s physical characters. Many modern statisticians and other scientists would say a different approach is required. Photo courtesy of © The Geophysical Institute, Unversity of Alaska-Fairbanks.

    Which brings us to something even more stunning. Reality is situational. Scientists, philosophers, and ethicists have been converging on that worldview for more than a century now. The general public, meanwhile, has been a bit more resistant.

    But here’s a cheery thought. The whole time, we birders have known about it. Think back to the last time you said something along the lines of the following: “If I were in California, I’d surely call this bird a Thayer’s Gull.” There’s a lot more philosophy—and no small amount of statistics—in that statement than I think a great many of us give ourselves credit for.

    __________________________________

    Footnote: Possible Iceland Gull in California.

    p(Iceland) = 0.001
    p(Thayer’s) = 0.999
    p(pale|Iceland) = 0.99
    p(pale|Thayer’s) = 0.02

    By Bayes’ Theorem:

    p(Iceland|pale) = p(Iceland∩pale)/p(pale), where

    p(Iceland∩pale) = 0.001×0.99 = 0.00099, and
    p(pale) = p(Iceland∩pale)+p(Thayer’s∩pale) = 0.00099+0.999×0.02 = 0.02097

    Then:

    p(Iceland|pale) = 0.00099/0.02097 = 0.047

    For Pennsylvania, update p(Iceland) to 0.9 and update p(Thayer’s) to 0.1. Following the same steps as above, we get p(Iceland|pale) = 0.998.

    The relative odds of occurrence (Pennsylvania to California) are given by the cross-product ratio, (0.953×0.998)/(0.047×0.002) = 10,118.

    The following two tabs change content below.
    Ted Floyd

    Ted Floyd

    Ted Floyd is the Editor of Birding magazine, and he is broadly involved in other programs and initiatives of the ABA. He is the author of more than 100 magazine and journal articles, and has written four recent books, including an ABA title, the ABA Guide to Birds of Colorado. Floyd is a frequent speaker at birding festivals and state ornithological society meetings, and he has served on the boards of several nonprofit organizations. Mainly, he listens to birds at night.
    Ted Floyd

    Latest posts by Ted Floyd (see all)

    • http://Nemesisbird.com Drew Weber

      Way to go pulling out the Bayesian statistics for the Thayer’s/Iceland conundrum! Love it.

    • http://www.surfbirds.com/blog/northcoastdiaries/ Mike Patterson

      Any time we try to put numbers to these kinds of questions, we invariably run into folks who will want to argue about the numbers (why 99% rather than 95% or 90%?) rather than the broader issue- gulls and Empidonax flycatchers are hard. That is the fundamental point of this article. Sometimes the drive to put something in the right box gets in the way of things like logic and reality and nothing brings that home like listening to gull-people argue about gull ID.

      Students of Quantum Mechanics may see in this shades of the Uncertainty Principle. The outcome is a function of the interaction between the observed and the observer. I propose that we think of these enigmatic larids as Schroedinger’s Gulls. The object exists in all states (as hinted here by Ted) until the observer interacts with it, and the identification depends on the state of the observer.

    • http://profile.typepad.com/naswick Nate Swick

      @Mike- Schroedinger’s Gulls is classic.

      I’m going to start incorporating that into my gulling…

    • http://profile.typepad.com/tfloyd Ted Floyd

      Schroedinger’s Gull is a cool notion, but I believe it’s not really at issue here. To be geeky, the aspect of the problem I’m addressing is firmly one of statistical inference, and, in particular, Bayesian inference. In theory, we could have perfect knowledge of the two parameters that I’m saying result in a probability level for a bird ID, those being (1) an estimate of variation among individuals in a population and (2) an estimate of the (local) likelihood of occurrence of an individual of a particular population. In reality, we never achieve such knowledge (because of imperfect technology, incomplete sampling, etc.), but it’s not impossible.

      The Schroedinger situtation is quite different, as it involves two (or more) parameters (e.g., mass and position) whose values covary and are influenced by observer effects. If there’s a physics analogy here (I’m not saying there should be, I’m just playing along!), maybe it’s the “nonlocality” problem of Einstein, Podolsky, and Rosen.

      Anyhow, the key result is this. A certain gull has a 4.7% probability of being an Iceland Gull in California. That same gull has a 99.8% probability of being an Iceland Gull in Pennsylvania.

      And I believe most birders and field ornithologists aren’t (yet) plugged into this strange and fascinating aspect of reality.

    • http://www.surfbirds.com/blog/northcoastdiaries/ Mike Patterson

      Spooky action at a distance, I’d thought of that with the twin gull supposition, but Schroedinger’s Gull has a better ring to it even if you have to stretch your definition of “state” to get there.

    • Matt Brady

      I’m not sure I agree with Mr Floyd’s argument here. Maybe I don’t understand it quite correctly, but it seems he’s not quite arguing what everyone else seems to think he is. As I understand it, he’s saying something like “99% of Thayer’s Gulls in CA are Iceland Gulls, and 98% of birds that look like Iceland Gulls are Iceland Gulls, and 1/10th of 1 percent of all Iceland and Thayer’s Gulls combined are Iceland Gulls. Therefore, the chances that a birder in CA is actually looking at an Iceland Gull is 4.78%.” I take that as meaning that birders can go out, and about 5 of every 100 Thayer’s Gulls they look at will be an Iceland Gull. But reality doesn’t reflect this argument; you can’t use probabilities to work out IDs. The chances that a particular bird is an Iceland Gull have NOTHING to do with what it actually is, and further, working out probabilities leads to assumptions that aren’t upheld by reality. Are birders misidentifying Iceland Gulls for Thayer’s Gulls, or vice versa? Absolutely. Are 5% of Thayer’s Gulls in CA actually Iceland Gulls? Absolutely not. Thayer’s Gulls are fairly common in CA, while there are just 40 or so seriously claimed (not even universally accepted and published) Iceland Gulls.

      To illustrate what I’m saying, I’d like to use a different example using species that are a bit clearer than gulls. Take shrikes for example. In California, there are hundreds of thousands of shrikes in the state every winter. In most years most shrikes are Loggerheads, a few are Northerns and none are Brown Shrikes. Say you use Mr Floyd’s approach, apply Bayesian Statistics, and you come up with percentages for all shrikes which work out to be something like this: 92% of all CA shrikes are Loggerheads, 8% are Northerns and 0% are Brown Shrikes. But if you go right now to McKinleyville in Humboldt County, you can see a Brown Shrike. Probability dictates that it is not one, and if you were to let probabilities influence your ID of the bird, you’d say something like “gee, that sure looks like a Brown Shrike, but it can’t be! I’m in California. It’s probably just a weird Loggerhead”. But it isn’t one. It’s a Brown Shrike. You have to ID birds based on what they look like, not by what percentages indicate.

      The “If I were in California, I’d surely call this bird a Thayer’s Gull.” argument is nonsensical and doesn’t get anyone anywhere. If it looks like a Thayer’s Gull, it probably is one, no matter where you happen to be standing.

    • Ben Coulter

      Thought-provoking post Ted. Indeterminacy is a good title. Though isn’t it circular logic to assume that the hypothetical observed ratios of THGU:ICGU are correct, if they were based on the same empirical dogma? Where do the ratios come from, if not from fallible observations?

      The lines really start to get a little fuzzy when you consider gull populations (‘species’) as fluid collections of genotypes, which change with genetic drift, and new contact and introgression events. Is there even such a thing as a real North American ‘Iceland Gull?’ If kumlieni is actually a hybrid slurry of genes resulting from hybridization event(s) between the northwestern proto-thayeri and nominate glaucoides, aren’t they all just a little bit thayeri (and glaucoides)? What percentage of west coast ‘Thayer’s Gulls’ have introgressed non-thayeri genes? Are any of those core range Thayer’s who have a Kumlien’s Gull ancestor five generations back really Thayer’s Gulls anymore? Does it matter in the grand scheme of things? The (scant) genetic research literature on large white-headed gulls is scintillating reading for the biologist types.

      Most important of all, can Joe Q. Lister count his rock solid ‘Iceland Gull’ on his Sonoma County list?

    • Derek

      I understand what Ted is trying to say but there a few things that aren’t quite right. First, it seems “paleness” is in no way a discrete variable so this application of Bayes’ theorem is in appopriate, even it were as simple as pale vs. dark. Perhaps a minor point in the overall scope of the article, but if we are intent on dragging Schroedinger and Einstein along with Bayes into the idea of gull identification, let’s be accurate. The other point is the notion that reality is situational. To the the extent addressed in the monozygotic twin gull hypothetical, the location of the photos doesn’t change reality. The gull really is one or the other assuming single species parentage (if not this whole example becomes muddled). I think the “best” idea that can be taken from Ted’s post is that rare birds are rare. Other than that, the issue is left to various records committees or listers as to how much evidence is necessary to “count” something. A more interesting example would have been to substitute something concrete for paleness … like say a DNA test with 99% accuracy. This wouldn’t change the mathematics terribly except a slight alteration on the Thayer’s probabilities. The argument presented seems to indicate that the “reality” should state this Iceland DNA positive Gull is in fact a Thayer’s if seen in California. Which again only points to the defining question … how much evidence do we need? “Paleness” might elicit a debate amongst birders … but would DNA? Rare birds are rare, and as Mike states, Gull ID is difficult. Probability theory only applies to a group not to an individual. Probability doesn’t change reality. Now onto adding hybrids into the equation … :)

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to Matt Brady. No, I’m saying something different. I’m saying: If you have a bird in California that appears to be an Iceland Gull (i.e., it resembles 99 out of 100 Iceland Gulls, plus the other quantitative assumptions in my original post), there’s only a 4.7% chance that THAT PARTICULAR BIRD is an Iceland Gull. In Pennsylvania, THAT SAME PARTICULAR BIRD has a 99.8% chance of being an Iceland Gull.

      What I’m saying (more to the point, what I prove via cold hard numbers) is that you can, indeed you must, use “probabilities” (rather, inferential procedures) “to work out IDs,” as you say. I’m talking about INFERENCES regarding INDIVIDUALS, but you’re talking about ESTIMATES regarding POPULATIONS. Apples and oranges. Apples and hamburgers, really.

      You also say: “You have to ID birds based on what they look like, not by what percentages indicate.” Well, you’re in good company. That model was in place for centuries. But the Bayesian revolution has swept it aside. If you know that some (even a very small amount) of Thayer’s look like Icelands, and vice versa, then those “percentages” are of supreme importance to the identification process.

      The shrike example is probaby spurious, inasmuch as no adult Brown Shrike is going to overlap in characters with any adult Loggerhead Shrike. But this is a very, very real problem with other taxa. In California, such examples include, but are not limited to: Thayer’s/Iceland Gulls, Cassin’s/Blue-headed Vireos, White-faced/Glossy Ibises, Willow/Alder Flycatchers, Sooty/Short-tailed Shearwaters, and many more.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to Ben Coulter. You raise a really important point, and perhaps I should have been more explicit about it. For the sake of argument, let’s say we DO have prior knowledge (that word “prior” is way loaded, as some readers will readily recognize) about the number of Iceland and Thayer’s gulls in California. That could come from a variety of sources, including, but not limited to: winter population data from credible sources in California, census info on the breeding grounds, understanding of dispersal patterns in the two species, you name it. Given those numbers, you construct what is known as a “prior [population] probability,” and then you get to work about making an inference regarding a particular individual.

      You mention hybrids. In this regard, Thayer’s and Iceland are a problematic example. (I just happened to have good photos of those two species, and I conveniently ignored the hybrid issue in the analysis that followed) So, then, just change the worked example (probabilities for Thayer’s and Iceland) to species pairs for which there is assumed to be little hybridization, perhaps Cassin’s/Blue-headed Vireos, perhaps Willow/Alder Flycatchers.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to Derek. Good points. As to discrete vs. continuous variables, note that it’s fine to create discrete (or “categorical”) variables out of continuous variables. You can simply say, “99% of Iceland Gulls are this pale,” or, “98% of Thayer’s Gulls are not this pale,” and so forth. In terms of doing the math, the procedure is the same as saying, “99% have blue eyes,” or “98% have brown eyes,” or what have you.

      Still, I appreciate that you’re uncomforatable with something as imprecise as “pale.” I sense that you want something more “morphometric,” e.g., rectrix lengths in millimeters or, even better, DNA data. I agree, those are great (albeit continuous!) kinds of data for this sort of analysis. Note, though, that “paleness” can, in fact, be applied in such an analysis. A really nice example of this approach is Steve Howell’s “Shades of Gray” article in the February 2003 issue of Birding.

      Finally, this is off the mark: “Probability theory only applies to a group not to an individual.” The point, rather, is that you can indeed make INFERENCES about an INDIVIDUAL. Indeed, we do this all the time. Here’s a grim example: If a doctor tells you have symptoms shown by, say, 99% of individuals with a rare disease, you’re most certainly going to find yourself making decisions (inferences) about medications, procedures, etc., as they pertain to you as an individual. Very important point in this human/medical example: Even though 99% of individuals with the disease show the symptoms, you might have only a 4.7% of having the rare disease. Oh, yes: This discussion applies to individuals, sometimes in life-and-death ways.

    • Matt Brady

      But how can the same bird be two different things, with two different scientific names, in two different places? That seems like it violates first principles of logic, and is therefore an impossibility that no amount of statistics, Bayesian or otherwise, can fix. It’s either one species or another, it can’t be both. Unless of course there is an inherent problem with the two species (the Thayer’s/Iceland Gull example is problematic because of this – that’s why I moved on to shrikes). Besides, no matter how small the chances are that a particular individual of a species is at a particular place, that becomes irrelevant when the event happens. Moving away from the Gull example, when you have a Blue-headed Vireo in California it doesn’t matter if it’s a rare event or not, because it happened. If, because you worry that it may be just a bright Cassin’s that happens to look exactly like a Blue-headed and you pass it off as the more common species, it then becomes impossible for you to ever see a Blue-headed Vireo in California, which is, of course, ridiculous. If it were true that you could never really be certain of what you were seeing, then how would you know that anything were anything? Everything could just be masquerading as something else. The Vireo example becomes more problematic when you have a bird that exhibits features of both species, of course, but when you have a ‘classic’ Blue-headed Vireo and worry that it could be a Cassin’s, then you’re beginning to ignore logic.

    • RalphEmerson

      And we wonder why other folks call us “bird geeks”?

    • Charles Swift

      Ted – Correct me if I’m wrong but I think there is a use for this approach which BRC’s might want to consider. Lets say a dark backed gull shows up in Idaho (where all dark-backed gulls are rare) and most indications are it is Lesser Black-backed Gull but 1 or 2 BRC members balk because Kelp Gull is not completely eliminated by the documentation. Although it might be hard to assign probabilities we could say based on recent history that Kelp Gull has a very much lower probability of occurring in Idaho than Lesser-black Backed. Using the Bayesian approach to the above example shouldn’t we be able to say there is a very high probability that the bird is in fact a Lesser Black-backed and not a Kelp Gull? Coincidentally I’m taking a STAT 401 right now and we just covered this so I may try to work this out for practice! Thanks for bringing this topic up again.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to Matt Brady’s latest: Spot on! You raise a perfectly valid question. More to the point, you highlight a problem with how I presented the argument in the first place. (Note to all of y’all: Thanks for the great feedback. Although I’m definitely a “believer” when it comes to a Bayesian inference, I’m not necessarily an effective “evangelist.” It’s great that y’all are keeping me honest–and helping me to hone my own thinking.)

      Okay, so let’s take this into the realm of Cassin’s and Blue-headed Vireos. In this scenario, a bird at the dreaded Cassin’s/Blue-headed nexus wanders to California at a time of year when Cassin’s outnumber Blue-headeds by about 100:1. (Let’s call that the third week of September?) The other numbers (98%, 99%…) are exactly as in the Thayer’s/Iceland example. The California Bird Records Committe (CBRC) does the right thing, as it always, does and “rejects” the record (p=0.047), saying it was probably just a Cassin’s Vireo, and not a rare Blue-headed.

      So far, so good. Now here’s where I messed up. I shouldn’t have dragged in that scenario about having perfect knowledge of the bird’s genetic identity relative to one of its clutch-mates. Let’s do this differently. Here goes. Ready for it?–Somebody bands the bird in California.

      The next year, a birder sees the bird in a region of South Dakota in which Blue-headed Vireos outnumber Cassin’s Vireos by 10 to 1. (Call it 9 to 1 to preserve the Thayer’s/Iceland numbers.) On seeing the bird, that birder reasonably says that the bird has a 99.8% chance of being a Blue-headed.

      Now here’s where things gets quite interesting. At this point, the birder notices the band and figures out that the bird had been in California the year before. The birder in South Dakota says, “Oh, well, I have huge respect for the CBRC, as all birders do, and they called this bird a Cassin’s, so that’s what it is.”

      Problem is, the folks with the CBRC also find out about the South Dakota sighting. “Wait a minute,” they say. “A reasonable, statistically savvy birder in South Dakota initially calculated a 99.8% probability that this bird is a Blue-headed.”

      Then the bird flies away, never to be seen again. No genetic analysis, no nothin’.

      You see the problem, right?

      To use, Mike Patterson’s metaphor, we surely have on our hands a case here of “Schroedinger’s Vireo,” hovering in this indeterminate probability space somewhere between “probably Cassin’s” all the very way to “very likely Blue-headed.” When the bird was in California, there was a 95.3% probability it was a Cassin’s Vireo; but when it was in South Dakota, there was a 99.8% chance it was a Blue-headed Vireo. And there’s no way to resolve that discrepancy.

      That’s a very cool result.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to Charles Swift: That’s a superb example. Wish I’d thought of it… ;)

      Given a gull’s mantle color, you can, in theory, assign a probability that it’s a Slaty-backed, or Yellow-footed, or Lesser Black-backed, or whatever. But then you also have information, in theory, on the probability of occurrence in Idaho of each of those.

      Now, in practice, your information is going to be pretty crude (“Lesser Black-backed is more likely than Kelp.”) But that’s a start! Folks shouldn’t dismiss it. It should be a part of a record committee’s deliberations. In a sense, probability of occurrence is a quantitative morphometric variable, along with tarsus length, mantle color, and DNA.

      Again, great example.

    • Derek

      The vireo example is a perfect illustration of the flaws with this “revolution” and a good reason why we should hope the Bayesian’s never complete a coup. Putting aside the (evidently) inherent infallibility of the CBRC for a moment. Ted is basically saying that a bander using whatever knowledge available assigns an ID of Blue-headed Vireo to a bird. Ted’s Bayesian CBRC dismisses the bird based solely on population dynamics though it is apparently more like that 1% of Solitary Vireos in California at the time that “are” Blue-headed. Then the vireo shows up in SD where it is “supposed” to be a Blu-headed. One CBRC member whips out his slide rule exclaiming “Now the bird is 1.047 times as likely to be a Blue-headed!!! Quick to the Bayes Cave to ammend the records!” The final CBRC member held over from the non-Bayesian Dark Ages points out that since assumptions about relative population frequency are either based on colloquial evidence or scientific data which invariably are published years after the study date, all of their decisions have been made from error-filled numbers. Of course, the rest of the clan promptly assassinates him for such logic.

    • http://profile.typepad.com/serpophaga Adam

      I don’t think Bayesian inference is needed to demonstrate the “cool result” as shown in the Blue-headed Vireo example. Correct me if I’m wrong, but here it goes.

      Basically, you’re assigning probabilities that any given individual of the Blue-headed/Cassin’s pair taken out of either pool, is 1 in 100 in CA, versus 100 to 1 in S Dakota. Put another way: a birder reaches into a very large bag of September Vireos in California, a large bag he is quite sure contains 100 Cassin’s and 1 Blue-headed Vireo. Without looking at the bird, he can say, reasonably, that each individual bird he pulls (with replacement) is likely a Cassin’s.

      Likewise, a birder in S Dakota has a bag of Vireos, and his bag has 100 Blue-headeds and 1 Cassin’s. He reaches in, and without looking, says it’s likely a Blue-headed.

      More to the point, if the birder in S Dakota reached into the CA bag of Vireos, or the CA birder reached into the S Dakota bag, with the knowledge of the bag’s composition, they would alter their expectations (even if they unexpectedly pulled the same Vireo out of either bag!). This is entirely reasonable. If I see a Solitary Vireo in S Dakota, and someone asks me at gunpoint whether it’s a Blue-headed or Cassin’s, I’d say Blue-headed. I’d do the opposite in CA.

      The part of this that seems to be the sticking point for people is as follows. You’re saying the 1 bird in 100 in the S Dakota bag, and the 1 bird in 100 in the CA bag, could be the same bird in either bag (which it could be, as above, but only until you receive further information, as below).

      A major problem: these seem to be more-or less blind probabilities ignoring empirical observations (say what you want about the strides made in inferential statistics: most birders will look at a bird and use observations to determine identity, not posterior probabilities). For example, in the bags of Vireos: if I pull a Vireo from the CA bag, and don’t look at it, I’m going to say it’s a Cassin’s, and I’ll be quite certain I’m right. Same for the S Dakota bag with Blue-headed.

      Now, If I pull it out, then LOOK at it, I’m going to say it’s a Cassin’s IF IT IS a Cassin’s. If I pull it out, regardless of the prior probability, and it is a Blue-headed, however improbable, I’m going to say it is a Blue-headed (this is assuming several things, including my own competency at Vireo ID).

      So, previous to any observation of the birds in the bags, and knowing only the general composition of the respective bags (i.e. are we in S Dakota or CA) I can say of ANY INDIVIDUAL in the bag that there is a 1/100 probability it is a Cassin’s in CA, or 1/100 probability it is a Blue-headed in S Dakota, even if the same bird is in both samples.

      I don’t see anything strange here, maybe just mildly counter-intuitive. The major point to make, is that the Vireo “collapses” (pardon reference to physics) into either species (ignoring possible hybrids, or birds which we can’t positively ID, both healthy possibilities in the gull and vireo examples) once we pull it out of the bag and look at it.

      Bottom line: if you pull out a Vireo in S Dakota, and it’s a Blue-headed, it’s a Blue-headed. Unless it’s a Cassin’s. Switch that around and it fits California.

      As for the Thayer’s/Iceland gulls…

    • Derek

      I don’t see anything “cool” from the probability “discrepancy”. They give an estimation of the relative frequency of two populations of Solitary Vireos. Nothing more. And the numbers themselves have absolutely nothing to do with what the vireo “really” is. I just find this tunnel-visioned approach utterly fantastic. Bird ID made from Ted’s point of view is inherently inaccurate because the data used to base the ID is inaccurate. The reason that no example can be made to illustrate how great a Bayesian approach is, is that the it is fundamentally flawed to begin with. Yes population dynamics can be used to make a guess at bird ID. But probability is ultimately irrelevant to reality. Furthermore, this approach takes the focus away from the individual bird itself. And that is where I think most of us birders want our focus to be. That’s where the fun lies. Even with empids and larids flying unabated throughout the countryside.

    • oliver h

      @Derek.

      To say probability is ultimately irrelevant to reality is pretty silly imho. This artile is really trying to get us to realize this simple point: Life (including birding) is full to the brim of false dichotomies and logical falacies. Life is not as cut and dry, black and white, as many of us try to make it. Probabilities are used all the time because of that very reason. You can never be 100% certain about anything. Good science doesn’t ever prove that A is the correct answer…it instead uses statistics and probabilites to show that it’s most likely not B…or C…or D…etc.

    • Ned Brinkley

      One of the most refreshing essays on this set of problems in some time! I think that many contributors to the Frontiers of Field Identification listserve have made similar arguments about situational assumptions in the identification of Thayer’s/Iceland in various parts of the continent (and world). I follow the use of statistics here, but I feel it’s somewhat misleading – there are unverifiable assumptions about occurrence frequency of these “taxa” that underlie these calculations. So I think you can make the same argument without the statistics. And the argument certainly can be applicable to many, many taxa, birds and otherwise. Of course, many birders – if not most – would agree that their field identifications of birds are not absolute or infallible but assertions of probability based on phenotype. That said, I feel fairly confident that yesterday’s Belted Kingfisher was a satisfactorily solid field identification. But yesterday’s American Black Duck? It looked like one. I saw the bird’s plumage well. I didn’t see its genes. Probably has some Mallard, though – many of them here do – even though it’s probably not possible to detect those genes by looking at plumage in the field. I logged it as an American Black Duck. Can I live with the probability that it was not “pure” but I still recorded it as a species? I suppose so. Until perhaps we have checklists that have categories like “Anas-type duck, phenotypically indistinguishable from American Black Duck but in area of extensive hybridization with Mallard”.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Derek says: “Yes population dynamics can be used to make a guess at bird ID. But probability is ultimately irrelevant to reality.”

      (First, a historical note. Heisenberg & Co. discovered that reality is probabilistic. All our cherished concepts of here and there, now and then, black and white, good and evil–serious thinkers and empiricists tossed those things out the window close to 80 years ago.)

      As to “population dynamics,” I think it’s important to view probability of occurrence as a subjective field mark, just like any other field mark. Let’s say a rare flycatcher shows up in our mist-nets. We have data. Stuff like:

      1. Rectrix length and wing formula. Based on what we know of variation in these characters, and based on what we know of the effects of feather wear and age, we say that our bird in the net has such and such a probability of being such and such a species.

      2. DNA. Based on what we know from previous studies, we say that our bird in the net has such and such a probability of being such and such a species.

      3. Vocalizations. We have a sonogram, let’s say. Based on what we know of vocal variation, we say that our bird in the net has such and such a probability of being such and such a species.

      4. Population. Based on what we know of the population (size, dispersal, geographic distribution, etc.), we say that the bird in our net has such and such a probability of being such and such a species.

      All of the preceding are variable. They are subject to variation. As factors (parameters) that contribute to the identification process, they are subject to the laws of probability. We arrive at a guess (or inference) as to the bird’s identity (Alder Flycatcher in Utah, let’s say), and we are more or less confident about it.

      Derek seems not to like the population parameter (forgive me if I’m guilty of faulty mindreading) because it is subject to variation and error. Well, so are all those other parameters (tail, wing, DNA, call…).

      Now what of a seemingly slam-dunk ID, like Ned Brinkley’s Belted Kingfisher? Here, again, I’ll resort to a physics metaphor. The probability that the bird is not a Belted Kingfisher is vanishingly small–like the probability that an electron which I observe right here and now will pop up in the Galaxy in Andromeda the next moment. But, according to physics, that could happen.

      Now, practically speaking, that would never would happen, so we turn our sights to rather more interesting problems, such as the location of an electron within an atom. That IS interesting. And the birding analogy would be such “fuzzy” taxa as Alder/Willow, Blue-headed/Cassin’s, Iceland/Thayer’s, and so forth.

      Along with the uncertainty associated with the bird ID parameter of population estimates (Derek), we have all the uncertainty associated with vocalizations, plumage, and perhaps DNA. If you bird at the intersection (not just geographically speaking) of these taxa, as I do, you quickly come to appreciate these taxa for their overall fuzziness. The best you can do is assign a probability, even in cases that some might want to be slam-dunk: The “good” Blue-headed suddenly starts giving Cassin’s-like vocalizations; the Alder that looked “good” based on in-hand measurements looks more like a Willow when you study photos in the field; the Iceland that sure looked “good” in photos proves to have the wrong DNA; and so forth.

      Derek also says: “Furthermore, this approach takes the focus away from the individual bird itself. And that is where I think most of us birders want our focus to be. That’s where the fun lies. Even with empids and larids flying unabated throughout the countryside.”

      Certainly, I enjoy pondering individual birds. But I do so against a backdrop of what I know about other birds in the population. I do so with a knowledge of, and appreciation for, the intrinsic variation in such bird ID parameters as classical morphometrics, song-learning, genetics, and, yes, population status. The best I can do is to make inferences about the individual. I’m never really certain.

      At some level, honestly, birders’ different approaches reflect their different worldviews and ideologies. I confess: I strongly resist the essentialist, or typological, worldview, which demands that we assign every phenomenon to a category. I happen to believe that an alternative worldview–which I refer to, rather broadly so, as “probabilistic”–is more “correct” than a typological, put-the-bird-in-a-box worldview; but I also accept that there is an ideological aspect to my judgments. Conversely, and obviously, the same ideological danger is there for those who prefer to view the world as deterministic, rather than probabilistic. Always question your assumptions: That’s the powerful lesson the probabilistic worldview has taught us.

    • http://profile.typepad.com/rterri2 Rterri2

      Where to start…

      In my mind, there are several types of bizarre arguments interspersed throughout this blog post, so I’m going to attempt to segregate and think about them in categories.

      “a species is not just a group of morphologically similar individuals, but a group that can breed only among themselves, excluding all others.”
      -Ersnt Mayr
      Though several valid objections have been raised to the Biological species concept, it is generally the one that Ornithologists like to use, and it is the most functionally applicable species concept for identification, so lets go with it for now.
      Let’s not forget that a species is not a name we assign to it. What we see, think we see, or try to see is simply the process of learning (generally through reading a book, though sometimes by actually studying breeding populations) about the nature of populations, and attempting to assign a name to them. That is identification. Which population an animal came from is completely independent from the observer.
      So, let’s give an example of how not to and how to identify a bird.
      You could, for example step outside here in Baton Rouge. Before you do that, you could search all the ebird data for this date and location, and generate probabilities and percentages for each species (though they would be useless if you don’t incorporate variance, something I’m pretty surprised noone has picked up on, which I will get to later), and go out with your eyes closed; and start listing. First chip you hear is this, then that (by probability), keep going until you have enough birds to make integers for everything. The problem is that even though you have your eyes closed (and, say, don’t know bird calls very well), that chip in front of you is a Pine Warbler. Whether you know it, say it, believe it or not.
      Or, you could attempt to learn as much as you can about phenotypic variation and species limits in most of the birds you could reasonably expect to see, and base your identifications off that knowledge, accept that you may not be able to completely confidently identify some birds.
      Though I do certainly aknowledge and appreciate attempts at working out difficult identifications, I think they should be rooted in Biology. While Floydian (I’m not sure Mr. Bayes would approve of this approach) statistics give the impression of rigor and thought, it is exactly these aspects in which it is lacking. It does the opposite of statistics and the opposite of science, and uses uncertain assumptions to make statements about reality with certainty.
      So, while we are on statistics, I have two main things to point out. The first lies between Biology and statistics, and that is that to make anywhere near the sort of inference that Floydian statistics attempt to make, you actually need to know a hell of a lot about your organism. What genetic interactions there are between the breeding populations, what the inter and intra specific phenotypic variation is between the breeding populations, and, importantly for the Thayer’s/Iceland Gull example, which populations you are looking at in your study area. For all of these, Thayer’s/Iceland Gull is about the worst example possible to attempt to get a hard number out of. Even though we all see lots of these birds, they are incredibly poorly studied. Noone really knows the species limits. Noones knows the wintering ranges. We see phenotypic variation and vague patterns in the wintering grounds, but without Biological meaning the best we are doing is making very generalized guesses about relatively unknown populations. Absolutely not the stuff of these “hard numbers” referred to by Floydian statisticians.
      The best-known population of birds I can think of is the Darwin’s Finches of Daphne major, of which every (or nearly every) individual has been banded, bled, and tracked since 1973. This is a system where you might be able to attempt to put number to things. You know where babies came from, what their parents looked like, where they went, etc…, but, here’s my big point, and the big loop of bizarre logic upon which Floydian statistics is based: Why? If you know enough about a species or group of species to be able to but probabilities on their non-breeding distributions, chances are you know enough about them to just flat-out identify them. Maybe you’ll have to ask the Grants, but I doubt they would advocate the use of Floydian statistics in identification of their birds.

      Another point on statistics is variance. This is a big point, too. Bayesian statistics are not just a way to make up numbers and base reality around them. All statistics are attempts to reflect reality, and the variance is just as important as the mean to these numbers. Since Floydian statistics make no discernemt between naked numbers and means, I’ll assume they are means (Floydian statisticians using them as such in their calculations), which, by the most basic laws of Mathematics, must have variance.

      Keeping in mind that this statistical argument is by nature fallacious, lets see what happens when we add uncertainty
      p(Iceland) = 0.001
      Here we are in California. Are we sure that one in a thousand ICGU/THGU types are Icelands? Exactly not one in one thousand and one, or one in nine hundred ninety-nice? How can you possibly know that? How about we say its somewhere between on in 300, and one in 10,000; the true mean is probably in there somewhere.
      Then p(Iceland) = 0.03 – 0.0001
      And p(Thayer’s) = 0.97 – 0.9991

      Do we have any idea how dark Iceland can be and how pale Thayer’s can be? Given an almost complete lack of study of birds on known parentage, I sorta doubt it. So how about we say between half and all but a tenth of a percent of of Icelands are pale, and between 1% and 25% of Thayer’s are pale. I actually think these variances should be much larger, because, remember, when we are looking at birds in the winter we actually have NO IDEA where they came from, but I’ll narrow it down to cut the Floydian statisticians some slack.

      p(pale|Iceland) = 0.50 – 0.999
      p(pale|Thayer’s) = 0.01 – 0.25
      By Bayes’ Theorem:
      p(Iceland|pale) = p(Iceland∩pale)/p(pale), where
      p(Iceland∩pale) = [0.0001×0.5 = 0.000005] – [0.3 x 0.999 = .29997] ( note that this is a min-max, not a minus sign

      p(pale) = p(Iceland∩pale)+p(Thayer’s∩pale) =
      =[0.000005+(0.01x0.97)= .009705] – [.29997+(0.25x0.99991)= .5499475]
      Then:
      p(Iceland|pale) = [0.000005/0.009705 = 0.0000515] – [0.29997 /.5499475 = .54545]

      So, the chance of your pale Thayer’s type in California being an Iceland, assuming validity of the logic and given generously small variance, is between 0.005% and 54.545%

      So profound it makes my posterior likelihood hurt.

      -Ryan Terrill

    • Paul Clapham

      Doctors are struggling with the same problem. Instead of rare gulls they have rare diseases, and they have tests for those rare diseases which aren’t 100% accurate. So then, if they understand probability correctly, they find themselves in the position of saying something like this to their patient:

      “You came out positive on the test, but it’s only 99% accurate, so there’s a 4.7% chance you have the disease.”

      A lot of their patients can’t understand that. Heck, a lot of the doctors can’t understand it either. But there it is.

    • Chris E

      Ted, You’re putting together two claims here.

      The first one, which is interesting and useful for birders to think more about, is that probabilities can work together in ways unexpected and even counterintuitive to those who haven’t thought about probability. This is a great discussion for birders to have, and it would be interesting to discuss it further.

      The second claim is that reality is what we believe. That is rather more contentious. Writing as a philosophy professor, I would not agree that “Reality is situational. Scientists, philosophers, and ethicists have been converging on that worldview for more than a century now.”—if what you mean by that is that if probabilities suggest we should conclude something is a Thayer’s Gull, then it IS a Thayer’s gull. The probabilities simply suggest that that’s what we should BELIEVE it is, what we should conclude it is, what we should submit to eBird. But if everyone else in the ABA were to suddenly form the belief that you’re a witch and that you’ve been following the Devil’s orders and eating babies, that does not entail that you have been. It’s important to allow that they might be wrong. Distinguishing beliefs from reality is an important part of sanity.

    • Derek

      With respect, this is getting a bit absurd. Probability as a branch of mathematics tells us the likelihood of an event based on a set of parameters, it doesn’t tell us what “is”. Ted’s gull example as originally stated is akin to the gambler’s fallacy. A coin flipped turns up Heads up 99 times in a row. So, the next flip is “probably” heads. A Thayer’s/Iceland Gull is a Thayer’s 99 times out of 100 in California, so this pale gull is a Thayer’s. Both are blatantly false lines of reasoning. And both ignore whatever intrinsic properties the coin/gull exhibit. Ted’s examples of subjective field marks are fine. Yes, I fully agree that all of those have some measure of error. But why choose historical or colloquial population estimates as the field mark to lionize? Unless I am wrong these population numbers are based on observations using all the other field marks as their foundation. Particularly when that is the least reliable of anything Ted has mentioned. The article as written doesn’t advocate the holistic approach that most birders (at least the ones I know) use and Ted mentions in his replies. Make no mistake, Ted says an Iceland Gull that is known to be an Iceland is in actuality a Thayer’s in CA, or vice versa in PA. In his vireo reply, he says a Blue-headed whose other field marks indicate Blue-headed is properly dismissed as Cassin’s until it shows up elsewhere. To me, this is preposterous. I’m not advocating throwing out population dynamics in bird ID, far from it. I think most birders and, hopefully, most bird records committees take this into consideration. But what I cannot advocate is using this methodology at the expense of other means. I don’t mind putting the bird-in-a-box view aside. But can’t support using only a Bayesian probabilistic framework to frame that box when I do decide to fence one in. Bird ID should be multifaceted in approach and consider all parameters/probabilites. If that leaves some birds unidentifiable so be it, I have done that many times in the field and see no problem with it. I would ask Ted to go back to his original article and example, and to my initial reply. Ted is viewing the Iceland Gull in California. He determines that is a Thayer’s based on location. A biologist standing next to him says hey we just DNA tested that bird and it is an Iceland. Does he change his determination? If yes, then apparently the DNA fieldmark is more trusted. If no, then what additional information would it take? DNA measurements + in hand physical measurements + photos of the underwing? I think the conceptual problem with Ted’s line of thinking is that location is given highest standing at the expense of other knowledge. In other words Ted is saying his subjective field mark is better than all of the other subjective field marks. And that is what I am not agreeable with.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Ryan: Let’s ditch the numbers. Check out the following:

      1. Some small percentage of individuals of Species A can resemble individuals of Species B. And some small percentage of individuals of Species B can resemble individuals of Species A. You okay with that?

      2. In region X, Species A outnumbers Species B. And in region Y, Species B greatly outnumbers Species A. You okay with that?

      3. We have some individual P, resembling Species A, that shows up in region X. We conclude that P very likely pertains to Species A.

      4. We some individual Q, resembling Species, A that shows up in region Y. Even so, we conclude that Q probably pertains to Species B.

      You okay with all of the preceding? If so, then let Q=an Iceland-like gull in California, let A=Iceland Gull, let Y=California, and let B=Thayer’s Gull.

      Or, as Paul Clapham says, “You came out positive on the test, but it’s only 99% accurate, so there’s a 4.7% chance you have the disease.”

      In other words, “The bird looks like an Iceland, but only 99% of Iceland Gulls resemble your bird, so there’s a 4.7% chance the bird is an Iceland.”

      As to dragging in variation and hybridization and such, well, sure…that’s life, that’s reality. (And it’s all noted in the original post.) But let’s not lose sight of the basic, and exciting, result: If an event is rare (rare bird, rare disease, whatever), appearances can be deceiving. I think birders understand both parts of the preceding (rare birds, tricky IDs). But the all-important linkage between the two (via Bayes’ Theorem) is the fascinating part, and I think it’s the part that’s eluding a lot of us.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Derek asks: “Ted is viewing the Iceland Gull in California. He determines that is a Thayer’s based on location. A biologist standing next to him says hey we just DNA tested that bird and it is an Iceland. Does he change his determination?”

      Yes, absolutely. I have more information now. In Bayesian lingo, I have just “updated” my probability.

      But here’s the critical point: It goes both ways!

      DNA data, as you surely know, is subject to the laws of probability: you have p-levels, confidence intervals, etc. with DNA, just as you do with any other data. As I’m sure you know, various *likelihood* (hint, hint) methods are used to create phylogenetic trees from DNA data. Those trees, and the relationships and identifications they imply, are probability statements.

      So let’s say a gel jock has determined that a bird is an Iceland Gull, based on its DNA. Now, I waltz in, and tell him the bird is from California. THAT MAKES A DIFFERENCE. The gel jock updates his probability, based on this info. The probability that the bird is an Iceland has just declined.

      Derek says, “If yes, then apparently the DNA fieldmark is more trusted.”

      Again, it goes both ways. In your scenario, I update my probability based on the new (posterior) information about DNA. But it can go the other way: We might update the DNA-based inference based on new (posterior) information about geography.

      Derek says, “If no, then what additional information would it take? DNA measurements + in hand physical measurements + photos of the underwing?” Yes, all those things are important.

      Derek says, “I think the conceptual problem with Ted’s line of thinking is that location is given highest standing at the expense of other knowledge. In other words Ted is saying his subjective field mark is better than all of the other subjective field marks.”

      Rather, I’m emphazising location, because I think it’s underapprediated (heck, it’s totally ignored) in many morphometric analysis. The conditional parameter of interest, for me, is location. But it could just as well have been, say, call note. Exchange “Utah/Pennsylvania” in the Alder/Willow example for “pip/wit,” and it’s the exact same argument.

      Think of it this way. You have a totally silent vagrant empid in the hand. You measure it, and you let it go. IF you had had that info on call notes (“pip” or “wit”), especially if it were quantitative (sonogram of the call), that would influence your determination of the bird’s ID. Next: You’re handed a specimen of a gull, but you don’t know where it’s from. What I’m saying is, it DOES matter where it’s from. California vs. Pennsylvania DOES make a difference in Thayer’s vs. Iceland. Yet I believe most birders discount geography as a hard/quantitative/empirical/morphometric field mark. And I think that’s a mistake.

    • Derek

      Ted, actually I would disagree with the “gel jock” altering his conclusion. Unless, there is some difference in the DNA between a California Iceland and a Pennsylvania Iceland that is quantifiable in the test results, the DNA test is exclusive of loaction. Yup, no doubt the DNA test has p-values. At least we know them and they are small. The p-values of population estimates are either unknown, unreported, or huge. You can’t point to that as a flaw in our line of thinking without first acknowledging that it is wildly more problematic in yours. Furthermore, you can’t throw away numbers as in Ryan’s analysis because the declaration that “this gull has only a 4.7% chance of being an Iceland” is much more attention grabbing and controversial than a more reflective “this gull has somewhere between a <1 and 50% chance of being an Iceland”. The last 4 sentences of your reply are the most, if only (albeit in my opinion only), useful point to consider. I think it is perfectly fair to assert location may be underappreciated. If this leads you or others to do groundbreaking field work in species population distribution … wonderful. I and the rest of the birding communtiy will applaud your efforts. Then we will all be able to use population frequency more adequately and confidently in our own determination. But I believe it is foolhardy to implore us birders to emphasize that facet ABOVE those other things we “know” to a better degree of certainty. This is the point that most of those replying have tried to impress upon you. The Iceland Gull CANNOT change its real identity by moving to California. The Bayesian population argument of the Blue-headed vireo resulted in a likely incorrect action by the hypothetical CBRC which somehow you wanted us to embrace. As birders or listers all we ask of ourselves is to make the best determination possible when endeavoring to make an ID. And that’s all we ask of BRC’s. Your argument has eschewed that idea of “best” to empasize what you felt was underappreciated.

    • James

      Fascinating thread. It seems worth noting that Derek’s argument from the gambler’s fallacy is itself a false line or reasoning. In fact it proves Ted’s point. This is because in the gambler’s fallacy, the chance that the coin will turn up heads remains 1/2, regardless of the previous 99 heads (unless the coin is rigged of course). But the chance that the bird that resembles a Thayer’s gull actually is a Thayer’s gull or not does depend on location. This is because the population distribution of a given gull species is in and of itself an intrinsic character of that species, determined by genetics and environment just as the more comfortable morphological characteristics. Of course distribution is more variable, and subject to greater dispute. But it certainly does not simply take shape completely randomly. In both cases weird flukes can happen, Iceland gulls can show up unexpectedly and we in theory could get 99 heads out of a hundred flips. But just as we would be wrong to ignore the intrinsic characteristics of the two-sided coin and say that the next coin flip will be a heads, we would be similarly wrong to ignore the intrinsic tendency of Thayer’s gulls to inhabit certain areas, and unless the coin is rigged or the gull is introduced, in both cases going with the probabilities is a good bet. Anyways, interesting stuff.

    • Derek

      James, interesting but sadly incorrect. In the Gambler’s fallacy, the historical flipping of the coin has no effect on the next flip. What does? The fact that it is two-sided! You actually have to know something about the danged coin!!! James is back to (mathematically) BLINDLY choosing the Gull. You apply some measure of criteria to the bird that eliminates say a much more likely Heerman’s Gull, but stop short of continuing those measures to separating Thayer’s from Iceland. If James and Ted want to say a random hypothetical Thayer’s type Gull in California is more likely Thayer’s than Iceland, super. Using this as an extension to a specific individual is problematic. The population distribution of a bird is based on a bunch observers saying this bird is here, this bird isn’t. And they use these other characteristics which get tossed to the side so easily to do so, morphometric data, color, shape, DNA, etc. In other words population distribution is NOT INTRINSIC to an individual. At least how we are using it here. Is population distribution a characteristic of birds? Sure. Is it worth considering? Sure. Is it based on other things enabling us to ID a bird? Yup. Is it so golden a rule we can avoid anything else? Ummm …. no. Consider this: Ted’s belief system says that this Gull is a Thayer’s. Ted’s belief system is based on a bunch observers saying that every year at this same time there are 1000 Thayer’s for every Iceland. The Gull in question is in every way typical of every other Iceland Gull in the world not in California. In fact it is typical of that 1 in 1000 Iceland Gull that Ted has used to base his calculations. In fact, this Gull IS the very Gull (or one of them) upon which Ted’s argumentative calculations are based (for the pseudo-mathematicians out there I use IS without loss of generality). Do you see the point? Ted uses a system founded on the identification of this Gull as an Iceland but then whips out a mathematical equation to show the identification is incorrect!!!!! Fantastic contradiction. How can the bird be good enough for Ted at the beginning, but not good enough at the end? And, by extension, how can anyone be expected to give serious creedence to this system as Ted has presented it.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Derek: You keep going back to what I humbly think is the straw man of geography as some sort of ueber-field mark. Um, I didn’t say that. Rather, I’m saying it’s (highly) important, along with, inter alia, DNA, call notes, wing chords, and so forth. And I’m saying it’s subject to variation, as is the case with DNA, call notes, wing chords, and so forth.

      I’m surprised by your continuing assertion that location (geography) is subject to far more variation (error) than traditional morphometric data and DNA. I don’t think that’s the case at all. In many instances, I think it’s just the opposite.

      Think for a moment of a vagrant Blue-winged/Golden-winged Warbler to California. The DNA-based data for separating the two species is highly variable. Same thing with wing chord measurements. Same thing with call notes. But something we do have is a very nice quantitative record of their occurrence in California. We really can say something like “There are 2.75 times as many records of Golden-winged as there are of Blue-winged.” (I don’t know the exact ratio, but you see the basic point.) I’m not so sure I’d want to base a Blue-winged/Golden-winged ID in California on DNA! But time of year and location might be quite important indeed.

      Or let’s use the example of Kelp and Lesser Black-backed Gull from Idaho. If you look at the record for Colorado, Utah, Wyoming, Nevada, etc., you have a great record for the relative occurrences of those two species. You really can say something like, “There are 460 Lesser Black-backed Gulls from Colorado, but only 1 Kelp Gull from that state.” Why would you discount that important piece of information in evaluating an extralimital “dark-mantled” gull?

      Another example: Let’s say you’re evaluating a record of an extralimital Aleutian Goose. We have great data (U.S. Fish & Wildlife Service) on the increase in that bird’s population. That should go into evaluating the likelihood of a vagrant in 2011 vs., say, 1981.

      Let me try a final approach to this. Suppose I show you a photo of a borderline Blue-headed/Cassin’s Vireo. Now, just to be nasty, I’ve PhotoShopped out the bird’s head. You ask to see it. That extra info obviously improves your inference about the bird’s ID. Next, you ask when and where the photo was taken. I tell you it’s from central Pennsylvania in July. And that extra info also (I would surely think!) improves your inference about the bird’s ID.

      Location (geography) IS a field mark.

      And Bayes’ theorem provides a powerful means whereby the inferential power of location can be incorporated into bird identification.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Derek:

      Hummmmm……

      Alright, then let’s do this with a medical diagnosis.

      The doctor tells you that your lab results are consistent with the lab results shown by 99% of people with a serious medical condition whose treatment is costly and involves nasty side effects.

      But because of the rarity of that disease (only 1 in 1,000 people have it), he also tells you that there’s only a 4.7% chance that you have the disease.

      You get that, right?

      So what’s so hard about this:

      A particular bird shows characters shown by 99% of Iceland Gulls.

      But given the rarity of Iceland Gulls in your area (1 in 1,000, say), you determine that the bird has only a 4.7% chance of being an Iceland Gull.

      You also said: “How can the bird be good enough for Ted at the beginning, but not good enough at the end? And, by extension, how can anyone be expected to give serious creedence to this system as Ted has presented it.”

      Perfect! You said it, brother. That IS the very essence of the Bayesian revolution. We “update” our probabilities as we go along.

      I realize that Bayesian inferential methods are not without criticism. But they’re being very seriously considered by a lot of folks with pretty sterling credentials in philosophy, math, statistics, the social sciences, the natural sciences, and so forth.

      But your opposition is a bit strident and assertive. You write “Fascinating contradition,” as if that’s a bad thing. Um, a whole lot of us find great power and beauty in the ability of Bayesian inferential methods to overturn centuries of dogma about how the world ought to be–but isn’t.

      Here’s a brief introduction:

      http://en.wikipedia.org/wiki/Bayesian_inference

    • http://profile.typepad.com/tfloyd Ted Floyd

      P.s. Derek, you’re missing something absolutely critical. You write, “The Gull in question is in every way typical of every other Iceland Gull in the world not in California.”

      No.

      I never said that.

      Rather, I said it is typical of 99% of Iceland Gulls and 2% of Thayer’s Gulls.

      If it were typical of 100% of Iceland Gulls, then all your arguments are, of course, correct. But changing 100% to 99% (and changing 0% to 2% for Thayer’s Gulls), **AND* adding in relative abundance, changes everything.

      If the only info we have is what I’ve presented above, then we reject the hypothesis that the bird is an Iceland Gull.

      QED.

      –Ted

    • http://profile.typepad.com/naswick Nate Swick

      Derek says – “Is it so golden a rule we can avoid anything else? Ummm …. no”

      This isn’t what Ted is saying, as he himself mentions in the comment above.

      As I understand it, Bayesian (or Floydian, whatever) statistics is a tool to try to understand the possibility of a rare bird occurring after, or even while, all the other, more conventional, methods of determining identification have been considered and found inconclusive, rather than as an alternative to those methods.

      It wouldn’t require a birder to disregard a slam-dunk Iceland Gull (even if such a thing really exists) in California if you have one in front of you if everything else points that way, but it might cause us to better consider the dissonance between our birder’s desire to put a name to everything and the birds that make that goal a difficult one.

      To paraphrase Steve Howell in his Gulls guide, the number of unidentifiable birds never reaches zero. This has been a great discussion!

    • Derek

      Ted, what I am saying is that all of our knowledge of population is based on some means of identifying the bird that does not involve population. Your initial presentation stated that monozygotic twin gulls were different in different places. Absolutely pathetic. The absurdity of this proposition was pointed out … and you agreed. Now get this friend … you come into to my office with this rare fatal disease test result. I tell you it is 99% accurate and this disease frequency. You say oh great I have only a 4.7% chance of actually having it. I say we have all these other tests that can help it narrow things down a bit. In fact that is exactly what a SCREENING TEST does. You say no bother, walk out, and are dead in a month. Or hey maybe your fine, who cares. You used probability to make an inference which had no bearing on reality. Is that clear enough? Do you get that friend? I have no problem with Bayesian inference. You can use it on any variable field mark you chose. I have no problem with using population dynamics as a field mark. Did you not read my previous post, did you miss that one friend? Now you are saying that my analysis of your Iceland is perfect for the Bayesian revolution. How? Maybe you didn’t quite grasp English or mathematics. None of the variables pertinent to the bird changed. So why “update” your probabilites? No new information was garnered. All you did was use a poorer method of analysis in the end. Why do I say poorer? Beacuse the population approach is ultimately based upon those other field marks. As such, the error in the population number is a compounded error from those of the other field marks. The error doesn’t get smaller. That is my point. If that went over your head, maybe you can google a wikipedia article on error calculation. Your reference on Bayesian inference is wonderful by the way. I’ll be sure to check out wikipedia for all my scientific musings from now on. And since the posting of a wikipedia article was obviously a petty intellectual shot across the bow and completely unwarranted I will retort. The biggest problem I have with you in the realm of this discussion is that you are throwing around Bayes and Einstein and Schroedinger in some attempt to add veracity purely by association to your take on how underappreciated population is. So, since you mastered wikipiedia articles on all of the above and can use Google we should assume this guy REALLY knows what he is talking about. And since you are such an expert in all manners of science surely you must accept the difficulties with putting any theory, even ones you support, into practice. What I and the rest of the responders have done is shown some of those problems. Albeit our collective understanding clearly pales in comparison to yours. If you want to say that your monozygotic twin gulls are really different in different places, your choice. If you enjoy the theoretical fun of the “fantastic contradiction” presented by using the same iceland to both found and disprove your stance, good on you mate. But don’t in good conscience ask us to ignore the logical inadequacies when approaching our hobby.

      In response to your specific points: the Blue-winged/Golden winged problem – if you don’t let me see the bird, and all you tell me is that it is a Blue or Golden-winged in California with Golden-winged positive DNA, I probably would be inclined to say Golden-winged unless the DNA was just so bad as to be statistically useless. And really how much of those wing chord measurements and vocalizations are muddied by extensive hybridization and back-crossing.

      The Kelp Gull: Yup the likelihood would go into my thinking. But probably not enough to over-ride strong physical evidence. Maybe it takes more physical evidence to convince me to call it a Kelp Gull as opposed to possible Kelp Gull the farther out of range it may be. This line of thinking is what I initially credited you with as the real purpose of the article.

      The Aleutian Goose: Yup, no doubt population expansion makes a vagrant more likely … but if the individual goose doesn’t look, sound, test, etc. like an Aleutian I am not going to call it one.

      The vireo: Yup I want to know all the information I can. If you deny me the head, make the photo fuzzy, whatever so all that can be discerned is that it is probably in the Solitary Vireo complex. Then of course I would agree it is probably a Blue-headed. But again this is rather the same as blindly picking a bird. Whatever means of separation I could make based on other data, that I would deem necessary and sufficient to ID a rarity … you have denied me.

    • Derek

      Ted, fine. The Iceland gull is typical of 99% of Iceland gulls. Nothing is typical of 100% Iceland Gulls. And yes you can’t use Bayesian inference when a probability calculation is zero … you did read wikipedia after all. My point doesn’t change, if the Gull is typical enough to be thought of as Iceland aside from geography, and typical enough that your observers count it as an Iceland in the field. You simply can’t discount that unchanged evidenced based on calculations ascertained from that very methodology.

    • Derek

      Nate, I agree with your point. But ignoring the slam dunk is exactly what Ted asked us to do. A bird that is 99% typical to Iceland, monozygotic with an Iceland found in Pennsylvania, seems as good a slam dunk as any Iceland that will ever be found in California.

    • Patricia

      Gentlemen and ladies :) … let’s put this bickering aside as it is straying from the important take home message that I believe Ted wants to get across. An awareness of bird population can make us all better birders. From an absolute standpoint Derek is correct. Logic tells us that a genetically identical pair of birds cannot be the different in different places … unless the bird was at 3 mile island in PA and mutated :) Likewise, if a set of criteria is good enough to determine the identity of a bird and that data is then used to contribute to our understanding of population, that same population data cannot then go back and dismiss the original identity … again that is illogical. Let us remember that everyone will weigh somethings differently some like GISS, some really need DNA, some like population. All of them have errors. Let us also remember that Bayesian inference can be put to the test on any variable we choose. It did seem that Ted was going against the grain and not exhausting the analysis of other more traditional, if not better, field marks. So thank you Ted for bringing population data farther into our collective birding awareness. Thank you Nate for trying to offer some mediation. Thank you Ryan for bringing some measure of uncertainty into the mathematics. Thanks to Derek for trying keep the discussion logically and mathematically rigorous. And thanks to everyone else for contributing. I don’t think Derek and Ted are actually too far apart. Ted maybe in enthusiasm for getting his populationist point of view out there did so a bit hastily and used a bad example. And maybe Derek is too focused on the rigors of the application of Ted’s approach to practical birding. Either way both of them make us better birders for considering their thoughts. And that is all we really want … to be better. So maybe time to shelve this discussion boys. Both points are made. We can now use them as we see fit.

    • Ryan Terrill

      At the risk of dragging out this discussion, i feel like the core of what you are attempin to say is “field marks and geography are useful in field identification”. I’m definitely on board with that.
      Past that, I think you are strecthing statistics, biology and logic in a misleading way. Pulling a number it of your cloaca and then making a statement about reality from it (and especially calling them cold, hard numbers) isn’t not just incorrect, it’s irresponsible. Ornithology has arguably the best interaction between amateur enthusiasts and professional scientists, and those looked up to (like professional ornithologists, records committee and periodical editors) have, in my mind, a very real responsibility to tell the truth to the birding community. I get your main point, but come on. Just because you thinksomething is true (like your 4.7% hypothesis -which is what it is, nothing more), does not make it reality! I can calculate probabilities from fake numbers about the chance I have a beer in my hand right now… Pretty good, actually, 87.34%…and… Still no beer. Damn it.

      Apologies for any typos – I’m writing this on my phone

      -Ryan

    • Ryan Terrill

      Also, no reason to walk me through the “logic” of the “calculation”. I had to work it out myself in order to redo the calculation, using numbers that actually fit perfectly into your example. So yeah, I’m okay with that.
      I’m actually a little surprised at the patronizing tone, too. I don’t think I said anything mean, and you have seemed to have kept your cool in previous discussions.

      -Ryan

    • http://profile.typepad.com/tfloyd Ted Floyd

      Hi, all. Let me try this one more time. Consider the following logical progression. Let me know if you think I’m making a mistake at any point. Thanks. Here goes:

      1. I show you a series of high-quality photos of a wood-pewee. From the photos, you get a good read on the color of the lower mandible, the pattern of the wing bars, wing formula, rectrix length, tail posture, the extent and shape of the breast patch, the overall color, etc. The image was taken by an automatic camera, so there is no hope of getting any additonal information about, say, vocalizations, behavior, etc.

      2. I ask you what you think the bird is. Given everything you know about wood-pewees, you’re pretty confident it’s an Eastern Wood-Pewee.

      3. Now I tell you where the bird is from. I tell you it’s from Pennsylvania. Now you go from “pretty confident” to “very confident.” But suppose I told you the bird is from California. Now you would go from “pretty confident” to “not at all confident.”

      4. In theory, we could do this analysis quantitatively. Although it would require a large sample (possible, as there are many wood-pewee specimens in museums) and heavy computing power (no problem at most universities or museums), you could, in theory, do a discriminant function analyis, or something like it, and say that your bird matches 99% of Eastern Wood-Pewees, but that 2% of Western Wood-Pewees also match your bird.

      5. Although this seems to be a hang-up for some (I’m not sure why), you could quite easily say that Eastern Wood-Pewees are outnumbered in California by Western Wood-Pewees by at least 1,000 to 1. (eBird data, biological inventories, expert opinion, BBS data, etc.) Similarly, and with analogous datasets, you could say Westerns are outnumbered by Easterns in Penna. by at least 1,000 to 1.

      6. Thus, you conclude it’s likely a Western Wood-Pewee if you’re in California (4.7% chance), but it’s likely an Eastern Wood-Pewee if you’re in Penna. (99.8% chance).

      7. Now check out the gull photo from Penna. (photo by Geoff Malosh of swimming bird) in my original post. Quite conceivably, such a photo might be presented to records committees in both Calif. & Penna.

      8. Isn’t it possible that a records committee in Calif. would call the bird a Thayer’s, but a records committee in Penna. would call it an Iceland?

      9. Now let’s say it really is the same bird. Or it’s two birds, but they’re identical twins.

      10. Even though the records committees have made no errors in logic or judgment or mathematics or biology, together the two committees have determined that the bird is two different things.

      Everything okay in the preceding?

      Thanks, –Ted

    • Derek

      I think this a fine example. Given what I know about separating wood-pewees (very little admittedly), if the wood-pewee photo only and the subsequent computer analysis (assuming Ted’s populations are good … which is fine for this example) are all we have, I wouldn’t fault the records committee for rejecting the record or the birder leaving the ID as possible Eastern Wood-pewee. But this conclusion is based on two HUGE ASSUMPTIONS. The first assumption, and I could be way off on this, is that the BEST way to separate Eastern and Western Pewees is based on vocalization. I am basing this only off one sentence in the Sibley Guide that says these two species are reliably only separated vocalization. I have no clue what or even if scientific studies have put numbers on these. The second ASSUMPTION I am making, and this is the big one, is that the population study that gives his Ted his relative abundancies were made using the BEST separation techniques. If large specimen number DNA studies (to give the study high power) with a small error rate were the one and only basis, then Ted’s population numbers are very, very convincing and should be treated with high regard. High enough to reach the conclusion that Ted’s bird is not a Eastern Wood-pewee though of course anything is possible. If the BEST separation technique is Vocalization as suggested by Sibley because either the DNA test is too expensive, doesn’t exist, or is too fraught with error to be useful, then again Ted’s population numbers are strong evidence against this being an Eastern Wood-pewee, though perhaps in reality not quite as good as the DNA study because there is more room for error plus observer error/bias. Nevertheless, we can for argument’s sake claim vocalization is statistically the best. It is better (somehow) than the physical field marks we can ascertain from the photos we are presented. Therefore, again we can reasonably agree that Ted’s bird is not an Eastern Wood-pewee though it is now possibly more likely. If on the otherhand when the California study team went out to canvas they chose not to use DNA, Vocalization, or anything else better than this suite of field marks Ted has given us access to, then the conclusion changes. A 99% cutoff line seems pretty high to use in fieldwork to start to define a population in practical terms, but if that is cutoff so be it. Thus defining the California population of Wood-pewees in these terms, the Wood-pewee is at or above the standard that already defines the California population thus it is Eastern Wood-pewee. The historical population numbers cannot then be used, even in Bayesian form, to disregard the Eastern Wood-pewee’s occurence as a statistical anomaly because it has already, by definition, been included. Now the big practical crux that Ted has to account for or overcome is how he defines the population he is looking at it. Note this doesn’t have to be as strict as defining a species which requires some pretty hard core evidence. But somewhere, somehow the people who are providing Ted, and all of us, with these population numbers have to make a stand and say this is the way we are going to make (and hopefully from a scientific perspective let us all know why) the differentiation. Some cases I would allow, this determination could be based solely on some laboratory data that birders cannot practically use. We then have to rely on things that correlate that data … morphometric data, color, vocalizations, etc. Sometimes these don’t correlate very well. Some cases, differentiation is made on morphometric data that banders and scientists use like wing chord measurement, primary extension, shades of gray, etc. that are very difficult for birders to use accurately … but we still try. I have seen analysis of empid photos on respected birding forums where the discussion centered on how long the primary extension was or how fat the eye-ring was. But much more commonly, I would guess (I could be absolutely wrong), local population numbers are defined by birders, scientists, paid observers based on a set of criteria that include vocalizations, morphometric data, size, shape, color … all the things that make up those traditional field marks birders use every day. And rarely are any of them super precise (i.e. the 99% we are using for calculation purposes). I would guess that California Wood-pewee Population Study commission would have given its trained observers some sort of guidelines to the effect that if it looks like an Eastern Wood-pewee based on physical criteria X,Y,Z to a probability (>50%, maybe even >75%) and vocalizes like a Eastern Wood-pewee record it as an Eastern Wood-pewee. If it doesn’t vocalize it has to fit 95% or greater X,Y,Z before you can call it an Eastern Wood-pewee. I think that was a reasonable example of what might occur. I cannot be certain because I honestly haven’t read many population studies so I don’t know the methodology. That’s where I am coming from. I think Ted’s math works as he wishes in his examples because his idea of base population relativities are based on unknown and theoretically unimpeachable means. And that just isn’t reality. If Ted is given ultimate power in his example and he decides that in order for an Eastern Wood-pewee based solely on physical criteria it has to meet X,Y,Z with 99.999999% certainty so be it. The areas of Bird ID where Ted’s approach based solely on population will really help birders is in those species studied extensively and widely with some high quality laboratory test DNA, cytology, etc. or species whose only means of separation birder’s can’t use practically … slight spectrographic differences in calls, small differences in absolute culmen length etc. The more variables we use to define our population differences the less useful it becomes by basic fundamental mathematics. Thus the real fundamental problem of the practical nature of Ted’s contentions is that populations do not exist in that theoretical vacuum. And of course everyone, on both sides I believe, acknowledges that whatever we do or use, absoute certainty is unattainable. Even if probability states that with 99.9999999% certainty this bird is X, it’s reality could be Y and that is not determined by probability. Please note everyone, that nowhere in the preceding did I discount Bayesian inference, its mathematics, continuous vs. discrete variables. Nowhere, in the preceding did I discount population as a field mark. What I am saying, and what every birder or scientists who endeavors to make population a highly important field mark must realize, accept, and account for is that population as a field mark at its best is representative of some other set of field marks. As such it can only be as good … never better as any of those other field marks.

      Now, going on back to Ted’s medical example for a second. If Ted came into my office with the same situation, test results, disease frequency and calculates in his head the 4.7% chance … if it is a serious/fatal disease I would think (hope) he would want to know how I know those 4.7% of the population have his disease. Sometimes it is a simple other test, sometimes a complicated one, sometimes I grimly tell Ted that those 4.7% of the people are the ones who died in a month. If it was me, or my daughter was the patient, and there was anything else we could use … let’s do it. Particularly if that is the gold standard that the doctors using to define that 4.7%. But to do or not do anything is based on a number of factors personal to the patient … so let’s leave that ensuing medical ethics discussion for another day :)

    • http://profile.typepad.com/tfloyd Ted Floyd

      Keeping this in the realm of wood-pewees (which seem to elicit calmer reactions than Thayland Gulls–go figure), I’d like to offer these points of perspective:

      1. Let’s accept, if only for the sake of argument, that the Calif. and Penna. wood-pewee ratios are in fact as stated earlier. I dunno, we’ve sequenced the entire genomes of 25,000 specimens collected from both states…something that would satisfy even Derek!

      2. Let’s also accept that a multivariate analysis of characters (lower mandible, tail angle, breast pattern, and more) can tell us that our bird “clusters” statistically with either Western Wood-Pewee or Eastern Wood-Pewee.

      3. We have the Penna. bird and the Calif. bird as presented earlier, with the Calif. bird having a 95.3% chance of being a Western, and the Penna. bird having a 99.8% chance of being an Eastern.

      4. But we also know, somehow, that they’re monozygotic twins from the same clutch.

      What is interesting to me is that #3 is inconsistent with #4, **even though no mistake has been made in #3.** Two records committees (CBRC and PORC in this case) have done all the right things, and yet thev’re come up with results that are inconsistent with the scenario in #4.

      And you would never, ever, know that there was a problem with #3 (2 p-values, under the assumption of non-twins), unless you knew #4. [Quite interesting to me is that, with scenario #4, you have an indeterminate p-value, but that's for another discussion.]

      Here’s a question: How often does this sort of thing happen?? Sure, it probably isn’t a problem with slam-dunk IDs–Eurasian Hoopoe in Alaska, Belted Kingfisher in Britain, etc. But those are trivially easy.

      But how about all the hard IDs?–empids in nets, Thayland Gulls, worn “Solitary Vireos” in spring, stuff like that?…the stuff bird records committees have to deal with? How many of these birds are we misidentifying, even while employing “best practices” by bird records committees?

      The short answer is, We don’t know. Typically, we only get as far as scenario #3.

      To me, it’s intriguing and wonderful that maybe, just maybe, we’re far more ignorant, and far more often wrong, than we know at present.

      Of course, I favor uncertainty and indeterminacy in this universe…

    • Derek

      Ted, sure that’s reasonable. Both records committees can reach different logically sound results even on (unknown) twins. The reaction to the example has less to do with Gulls vs. Wood-pewees than with a physical and logical impossibility vs. a reasonable hypothetical albeit one that realistically would not occur. The records committees (at least one of them) have erred (on one twin) on the conservative side based on presumably superb data. We just have to be careful how much emphasis to give geography/population. Certainly errors are being me. Are most of those errors incorrectly ID’ing a california Eastern Wood-pewee a Western … or the other way around. Geography/population can improperly bias us (probably too often) just as easily. There was an interesting discussion on one of the forums (perhaps frontiers … I don’t recall) of Hawk watch Cooper’s vs. Sharp-shinned counts. Volunteer counters were improperly “bending” their counts to fit more with what history told them to think. They weren’t using the “best” means available … actual physical observation. The relative population numbers of any species determined by means other than field observation are certainly small. Just as we must respect that data. We really must know where it comes from and how it was obtained. I also just submitted 1200 Iceland and Kelp Gulls into ebird at Ted’s favorite Colorado Gull watching spot … we’ll see how population effects his Gulling now :)

    • Derek

      Previous post should have read “certainly errors are being made”. Apologies for the typo. I don’t really proof or spellcheck much :)

    • Derek

      Previous post should have read “certainly errors are being made”. Apologies for the typo. I don’t really proof or spellcheck much :)

    • James Gilroy

      What a fascinating discussion. Brilliant stuff. I feel like it needs some kind of summary/conclusion though, so I offer my services as a complete outsider who just read through the whole thing.

      Ted’s broadest argument is that we should consider probability (based on location) when identifying birds. The more unlikely a record is, the greater the burden of evidence required to accept it as truth. I don’t think anyone disagrees with this (or ever has).

      Ted’s more contentious argument, however, is that we can actually use Baye’s theorem to help us in making an identification of a given bird. As various posters have pointed out, there are serious problems with this approach. In fact, it neatly illustrates how Bayesian statistics (which are extremely powerful and useful when used properly) can be unintentionally abused to come up with flawed conclusions.

      The fundamental problem with Ted’s approach is that it confuses inference (what we think something is) with data (what something actually is). In a proper application of Bayesian statistics, we take out prior beliefs (e.g. probability of a pale gull in California being an Iceland), and then test them against real data. The critical point is that the data must be an objective measure of the ‘truth’. It must be independent of our prior information.

      The trouble is, Ted is suggesting that we use use our prior information to decide whether or not a given bird is species X or Y. He’s then suggesting (indirectly) that the outcome of this process can be used as data. We can use this ‘data’ to update our prior information. But this is not data, it is inference. The logic is circular. In the absence of real objective data on whether or not the bird is actually X or Y, we have no way of updating our prior information. This means we cannot apply Bayesian statistics. We can use them to generate some guesses, but these will be so vague as to be almost useless given the lack of real data.

      Where does that leave us? I think we’re left with that word Ted has used repeatedly – uncertainty. We simply cannot know whether an Iceland-like bird in California is an Iceland or a Thayers, unless it is banded (or has some unequivocal genetic marker). Any attempt to cut through this uncertainty using statistics will be flawed, because we do not have any real data – we only have pseudo-data that is based primarily on our uncertain beliefs. We simply have to accept that we cannot idenfiy each individual with certainty. Hence, rather than saying “there’s only a 4.5% chance that this gull is an Iceland Gull, therefore I think it’s a Thayer’s”, we have to simply say “well, it’s a gull”. Then wander off and look at something less confusing…

    • Simon Mitchell

      As pointed out by others this argument is a fundemental mis-understanding of the species concept. There are very few examples of bird for which we have no idea of the species limits. The underlying fallacy in the Gull example is that the suite of characteristics are a continuous spectrum. Phenotypically in some cases they may be, but geneflow is sufficiently low too keep the two species seperate. Imagine both gulls are captured and their mitrochondrial DNA analysed. The two birds cannot be identical twins and correctly identified as two different species.
      Or perhaps the different identical twins of the Thayer’s / Iceland Gull occur in California once a year for twenty five years, that means 4.7% 25 = 117.5%. This would mean it is more than certain that one of the individuals was acutally mis-identified using the method described. That is why bird recording is done using emprical evidence and none inductive statistics. What exists in reality is not that which is most likely. Millions of very improbable events occur every day.

      As for Schroedinger’s Gull – Schroedinger was talking about uncertainty at a Quantum Level where particles can be in superposition and until measure can be mathematically proven to be in two places at once, or posses both clockwise and anticlockwise spin. A Schroedinger’s would have all two sets of characteristics (including genes) until you saw it at which point it would have only one.

    • http://profile.typepad.com/tfloyd Ted Floyd

      Thanks, everybody, for the continuing discussion. I’d like to respond in some detail to something James Gilroy said, and then in yet greater detail to something else he said. I happen to disagree with the gist of James’s recent post, but I also believe that he’s provided a superb distillation not only of his own thinking but also of various other folks’ thoughts. Here goes…

      1. James says: “Ted’s broadest argument is that we should consider probability (based on location) when identifying birds. The more unlikely a record is, the greater the burden of evidence required to accept it as truth. I don’t think anyone disagrees with this (or ever has).”

      Perhaps the enlightened folks in this forum don’t disagree (nor ever have), but I encounter an alternative point of view all the time. Although that alternative point of view is not usually expressed in formal quantitative terms, it is, in my experience, undeniably out there. In a nutshell, I hear people say stuff like (I’m paraphrasing), “That ‘Solitary Vireo’ shows marks consistent with 99% of Blue-headed Vireos.” Again, I recognize that such statements tend to be verbal, not quantitative; but the same idea is unmistakably there. Now what I don’t hear–and this is the key point–is stuff like (I’m paraphrasing), “However, Blue-headed Vireos are so rare in my state that there’s only a ~5% chance that this particular bird really is a Blue-headed Vireo.” In my experience, the 2nd part of that formulation is rarely expressed by birders, including birders on records committees. That is to say, we get to the “consistent with 99% of Blue-headed Vireos” first part of the formulation, **but we don’t get to the very important** “but Blue-headed Vireos are very rare” second part of the formulation. And we conclude, wrongly, that a particular vireo–again, I’m paraphrasing–”has a 99% chance of being a Blue-headed.” That’s as far as most of us get, I believe. (Present company excluded, acording to James.)

      To put it in formal terms, I believe most birders, including records committee members, get only this far: p(the bird looks like a Blue-headed, given that the bird really is a Blue-headed) = 0.99. Most birders, though, typically do not get this far: p(the bird really is a Blue-headed, given that the bird looks like a Blue-headed) = 0.047.

      Quite clearly, I’m speaking only from my only experience. And James, I assume, is speaking from his.

      Which brings me to what seems to me to be James’s key point.

      2. James says, “We simply cannot know whether an Iceland-like bird in California is an Iceland or a Thayers, unless it is banded (or has some unequivocal genetic marker).”

      Let’s start right there. Let’s start with banded birds and birds with unequivocal genetic markers.

      At a well-funded, well-staffed, long-running banding station in eastern Colorado, we know that 25 out of 25,000 (0.1%) wood-pewees are Easterns. At a sister station in central North Dakota, we know that 22,500 out of 25,000 (90%) of wood-pewees are Easterns.

      Similarly, as part of the FGP (“Flycatcher Genome Project”) we’ve found an awesome pleiotropic gene that codes for a suite of characters that make a bird look like an Eastern Wood-Pewee. 99% of birds in a population of Eastern Wood-Pewees carry this gene, and only 2% of birds in a population of Western Wood-Pewees carry this gene.

      At this point, I’ll stop to consider a likely objection: Both of the preceding criteria are circular. I agree. Whether the data are “hard” (e.g., DNA) or “soft” (e.g., eBird), the problem of circularity is unavoidable when it comes to saying what is or isn’t a species. We start off with a theory or hypothesis or ideology or religion (the Biological Species Concept, let’s say), and then we fit the data (facts) to the theory.

      I’ll return to this point, but, for now, let’s move on.

      We have a wood-pewee, we stick it in the DNA-o-matic contraption, and we determine that it has that pleiotropic gene that is carried by 99% of Eastern Wood-Pewees (and 2% of Western Wood-Pewees). But we need to know something else. We need to know where the bird came from. Well, we know that, too. It’s from eastern Colorado. So we conclude that the bird is actually pretty likely a Western Wood-Pewee. Or, to be more formal: Given the *data* (genetics, banding), we can make an *inference* about the individual. Note that, formally speaking, we have arrived at the raison d’etre for Bayesian inference, namely to estimate p(inference|data). (Honest, James, I know the difference; in fact, I’ve twice already in this thread provided a working definition of inference.)

      You’re right that there’s a problem of circularity with all of this, but it’s not because anybody’s confused about inference vs. data. It’s because we define species in such a manner as to fit the data (genes, morphometrics, what have you) to the model (e.g., Biological Species Concept).

      Finally, I think I ought to address the persistent complaint (coming from multiple fronts, although not necessarily from James) that I’m working from hypothetical numbers (one in a thousand, one in fifty, ninety-nine out of a hundred; “pale” vs. “dark”; Alder-like vs. Willow-like; etc.). Folks, they’re *just examples*. They illustrate key principles. In one of the grandest scientific treatises I’ve encountered, Albert Einstein’s Relativität, the author continually employs such examples as trains traveling at relativisitic velocities. In another fine offering in the tradition of Relativität, contemporary writer Brian Greene, in his Fabric of the Cosmos, frequently employs such examples as skateboards traveling at relativisitic velocities.

      Um, they’re not real. Fine. I get it. I totally get it. But the examples illustrate key principles, and that’s the important point. To get hung up on what would “really” happen to trains or skateboards moving at relativistic velocities would be as naive as getting hung up on whether Iceland Gulls “really” number 0.1% of the population in California, or whether 99% of “pale” Iceland/Thayer’s Gulls are actually Iceland Gulls. Why, it’s almost as if some people think I’m writing specifically (ha!) about Thayer’s and Iceland gulls. Rather, I’m making the much more general point that it would behoove us birders to shift our thinking to a modern inferential framework, namely, one which attempts to estimate p(inference|data).

    • James Gilroy

      Hi Ted, thanks for the response. And thanks for writing the article in the first place and stimulating this debate – very thought-provoking stuff!

      I know you’ve stated above that you understand the difference between inference and data – I don’t doubt it! – but I feel like you’re still making the same error with this latest example.

      I’ll clarify what I mean by ‘real’ data. I mean data that can unequivocally tell us whether an individual belongs to one taxon or another. In the hypothetical pewee example you just put forward, the genetic data do not differentiate the two taxa particularly well. For identification purposes, genes do not represent ‘good’ data in this case. We might as well be talking about any other overlapping characteristic.

      The big problem with considering overlapping characters as ‘data’ comes when you make an inference about the bird in your hand. It has genes shared by both taxa, but you consider it more likely to be the common taxon in that area. Sensible enough. But what if you’re wrong?

      Let’s say it was actually a vagrant, and you’ve misidentified it. You’ve effectively added a ‘plus one’ on the pile of records of the commoner taxon, whilst the lowly pile of records for the vagrant taxon stays where it is. In doing so, you’ve updated a key parameter of your formula (the expected rate of occurrence of the vagrant taxon), but you’ve pushed it in the wrong direction. Hence, next time you come across an individual with characters intermediate between the two taxa, you’ll consider it even less likely to be the vagrant. The initial error will propagate yet more errors.

      Obviously, error is unavoidable in any data gathering process, but the problem here is that the chances of error are extremely high. All we have to go on are some vague guesses about likely occurrence rates, and these are derived largely from a bunch of vague and error-prone guesses we made in the past.

      My point, if it’s not clear from the above example, is that if your identification is uncertain (i.e. it is based on probability, rather than a set of unequivocal diagnostic features), you cannot consider that identification as a datapoint. By feeding ‘probablistic’ identifications into a Bayesian analysis as if they were ‘real’ data, you invalidate the analysis. The data must represent an objective measure of the truth.

      The problem with asking birders to think about extralimital ‘cryptics’ in terms of p(inference|data) is that the ‘data’ in question isn’t really data at all. It’s just a long series of inferences based on very little concrete evidence. What is the true rate of occurrence of ‘Iceland’ Gull in California? Or ‘Blue-headed’ Vireo? We have no way of knowing for sure (on present knowledge). Ted, I know you stated that the numbers don’t really matter, and that the examples are just there to ‘illustrate the principle’, but for me all they do is illustrate that the principle doesn’t really work when applied to real cases. Without real data, it’s just an exercise in dressing up guesses as if they were truth.

      A fascinating exercise, nonetheless!

    • http://profile.typepad.com/tfloyd Ted Floyd

      Reply to James Gilroy: Okay, I get it. “Unequivocal.” Note that, from the very outset, I’ve been talking about character states that are decidedly *equivocal*. And those include genes, of course. Indeed, the fundamental driving force in the history of life on earth is genetic variation. But I digress…

      Anyhow, let’s explore what it means to identify birds on the basis of unequivocal character states. Chances are, those states are not going to be genetic. Let’s keep it simple and say we’re faced with the ID challenge of large, colorful, crested birds at East Coast feeders. Let’s further say we have only two species to deal with: Blue Jay and Northern Cardinal. And let’s say we have the following, absolutely unequivocal, character: color. If a bird is blue, it is unequivocally a Blue Jay; if a bird is red, it is unequivocally a Northern Cardinal. None of this 99% crap. 100% of Blue Jays are Blue, 100% of Northern Cardinals, and never the twain shall meet.

      Let’s take a look at what that implies:

      Because Blue Jays are always blue and never red, a red bird cannot be a Blue Jay. The only other bird in this ID scenario is a Northern Cardinal, which is always red and never blue. Thus, the ID is truly slam dunk.

      Now here’s the critical point. Let’s say a red bird shows up at my feeder in the suburbs northwest of Denver. (Remember: only 2 bird species in this scenario.) Even though cardinals are accidental in my neighborhood, the bird has to be a cardinal. That is the only possible outcome. The likelihood that this bird is a Northern Cardinal is exactly the same as that for a red bird in, say, Pennsylvania. To be exact, the probability is exactly 100%. We’re as certain in the suburbs northwest of Denver as in Pennsylvania. Location has nothing to do with it.

      We can also frame the preceding scenario in statistical terms. In particular: p(cardinal)=C, p(jay)=J, p(red|cardinal)=1, p(red|jay)=0. Then: p(cardinal|red)=1, for all values of C.

      Needless to say, there are cases in which such conditions are met, or nearly so: Northern Cardinal vs. Blue Jay, as here; Belted Kingfisher vs. Greater Roadrunner; Mottled Petrel vs. Strong-billed Woodcreeper; and so forth. But those are “trivial,” in both a mathematical sense and a bird ID sense.

      All the interesting cases are the ones in which the probability of observing some character state is *not* 1 or 0. That is to say, 0

      Alrighty, so at this point, I would imagine James would find nothing to object to in the preceding. But there’s been a method to my madness. With that jay/cardinal example behind us, I’d like to revisit now something James said, namely, “but for me all they do is illustrate that the principle doesn’t really work when applied to real cases. Without real data, it’s just an exercise in dressing up guesses as if they were truth.”

      In particular, I’m trying to grasp the distinction between “real data” and, I guess, “fake” or otherwise ersatz data. Thus, we might have:

      * real data, based on unequivocal character states, such as “red” and “blue” for Northern Cardinal and Blue Jay, respectively

      * other (fake, ersatz) data, based on equivocal character states, including, but not limited to, genes, wing formula, and, well, in many instances, color.

      I sense that “real” here has nothing to do with the data themselves (the color of a crested bird, the allele of a particular gene, the length of a particular primary), but rather with what statisticians refer to as their “conditional probability distribution” of those characters, that is to say, their relative proportions, or representations, in populations. If it’s an all-and-none (all-or-none not good enough!) distribtution, then the data are “real.” Otherwise, they are not.

      And here, it seems to me, is the final, inevitable consequence of the preceding:

      Deterministic Universe = Real
      Probabilistic Universe = Fake

      I prefer a probabilistic universe, one inhabited by “fuzzy” Icleand Gulls, Blue-headed Vireos, and Eastern Wood-Pewees. That’s far more exciting to me than a deterministic universe, inhabited only by unequivocal Blue Jays and Northern Cardinals.

    • James Gilroy

      Ted, I broadly concur with the above (at least the bits I can wrap my puny deterministic brain around!), and I agree that my desire for all things ‘unequivocal’ does leave little room for the ‘interesting’ fuzzy stuff…

      I agree that it’s a good idea to take a probablistic approach to identification when faced with one of these intractible ID conundrums, especially when you only have ‘equivocal’ features to go on.

      But you can’t apply formal Bayesian statistics to the problem in the way your article suggested. It’s ok to set up a Bayesian analysis with guesses (your ‘prior’ beliefs), but then you have to test those guesses against real data. In this case, you have to know whether or not you were correct in your ‘guess’ that the gull was just a pale Thayers, or the pewee was just an odd western. This is the ‘real’ data I’m talking about – knowing whether your guesses are correct. Without an objective test, the formal Bayesian approach is invalid – it’s a just a spiralling series of guesses.

      Certainly, we should weigh up the probabilities when trying to identify a tricky bird, especially if we think it might be something unusual. But we mustn’t be fooled into thinking that we actually know what those probabilities are – if we’re dealing with ‘fuzzy’ taxa, we’re probably always going to be guessing. Nothing wrong with that, as long as we acknowledge it for what it is.

    • Timothy Barksdale

      Not sure this works…

      Real life situations and prove you wrong. I’ll be back later today to elucidate.

    • http://profile.typepad.com/tfloyd Ted Floyd

      While waiting with bated breath for Tim Barksdale’s elucidation, I’ll briefly reply to James Gilroy’s comment from a few weeks ago. For sure, I understand the need for further testing (technically, updating critical-alpha). Thus:

      1. What is the probability it’s an Iceland Gull?
      2. What is the probability it’s an Iceland Gull, given that (“conditional on”) it’s from California?
      3. JAMES’S POINT: What is the probabiliy it’s an Iceland Gull, given (“conditional on”) additional information (James’s “real” data), e.g., “objective” knowledge of the bird’s parentage?

      Gosh, James and I have practically converged on the same viewpoint. You’ll never get me to come around to “objective” (my flaky probabilistic brain vs. James’s puny deterministic brain…), but we’re getting there.

      Anyhow, as long as (“objective”) tests do exist, then we at least have a guide (i.e., we can test hypotheses) when we can get no further than Step 2, above. And it’s quite often the case that we can get no further than Step 2. For example, we know what the bird looks like (Step 1), and we know it’s from California (Step 2), but we don’t have, say, a corpse or, at the very least, a feather (Step 3).

      I’ll say that James is 100% (oops, 99.9%) correct on the following point. If this exercise is only a constant cycling back and forth between Steps 1 and 2, then it’s all a grand tautology. You do need a foundation (theory, observations, what have you) of what James refers to as “real data.”

      And if you have that, then you’re justified in testing hypotheses–and rendering judgments–based only on Steps 1 and 2.

    • http://www.birdtrack.net Nick Moran

      Firstly let me say that as a British birder, it would be great to see this type of debate going on this side of the pond… particularly when we’re faced with our first record of a ‘probable’ Slaty-backed Gull (‘probable’ in the very loosest sense of the word)! Or perhaps it is and I’m just too busy trying to distinguish Common from Lesser Redpolls!

      Secondly, I’m very short of time / awake-brain-cells just now and I’m struggling to decipher the entire thread so I wonder if someone will point me to any post that tackles the following issue: What happens if the accepted ID ‘truths’ upon which the kinds of probabilities Ted starts off with turn out to be ill-conceived?

      To illustrate, I spent 5 years living in the UAE, where Siberian Stonechat (Saxicola [torquatus] maurus) was deemed to be the regular-wintering species(/race) of this genus. Any application of probability to the ID of stonechats in the UAE Sept-Mar would surely have perpetuated this notion? Recent observations are raising the strong possibility that it is actually European Stonechat (S. [t.] rubescens) which is the wintering form, which Siberian Stonechat occurs as a passage migrant. Birding is littered with examples of the accepted wisdom turning out to be incorrect as advances in ID are made (and, for that matter, as range / population changes occur).

      In other words, applying probability to ID seems (to me) to be applying an objective ‘conversion factor’ to a subjective/ever-changing data set – a somewhat risky business?

    Birders know well that the healthiest, most dynamic choruses contain many different voices. The birding community encompasses a wide variety of interests, talents, and convictions. All are welcome.
    If you like birding, we want to hear from you.
    Read More »

    Recent Comments

    Categories

    Authors

    Archives

    Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    • Planting an Interest at Farm Camp April 22, 2014 8:28
      To my brother Benjamin and me, it’s not summer without Farm Camp. Run by Connie, a teacher at my former middle school, and her husband David, Farm Camp is a small, outdoors-oriented, all-ages camp that runs throughout the first half of summer. […]
    • Adapting To A Human World April 17, 2014 11:08
      For many species, the slow process of evolution makes it very difficult to adapt to a dynamic society. However, some birds have evolved certain characteristics to assist in ensuring the survival of the species in the face of an ever-changing world. Others have learned behaviors that can assist in their survival. […]
    • From Coffee to Penguins: Winter Research 2014 April 2, 2014 6:04
      This post is the beginning of a series meant to highlight new discoveries about birds and make ornithological research more accessible to young birders. […]

    Follow ABA on Twitter

    Nature Blog Network