On Genealogical Proof

As a (comparatively) old-timer to computing, and a (definite) newcomer to genealogy, I have been trying to find out what is regarded as consituting "proof" (or more accuately "an adequate case") for a serious genealogist that two items or sets of data (e.g a marriage register entry and a birth register, records from two successive censuses, or even two GEDCOM file extracts), refer to the same individual(s), ie. are "linked". Another way of putting it is that I have been looking to see if anybody has defined some reasonably strong and effective heuristics, if not an algorithm, for comparing such sets of data. However my aim is in fact as much to get a better grip on how to do genealogy manually, and to understand what professional genealogists do, as with any idea of actually applying computers to it. (My apologies if this issue has already been debated ad nauseam in soc.roots.)

The genealogy books and articles I have read are distressingly vague on this matter, to my taste. Typically they might discuss the need for sufficient evidence (without defining what this means), the importance of taking all available evidence into account (what if some is incorrect?), and the need for exercising careful judgement (without explaining how to do so, in any but the most general of terms) and for being prepared to reconsider decisions if further, contradictory, evidence comes to light. And occasionally one finds rules of thumb about likely constraints on age, and suggestions about using Soundex-like rules for determining whether two differently-written names refer to the same actual person or family, but that seems to be about it.

Failing to find anything that I deemed adequate in a brief search of the genealogy literature I then took a look at the both the early, and some of the most recent, literature on "record linking" (or "family reconstitution") as practiced by historians, and sometimes by geneticists. (It is, incidentally, a matter of some surprise to me how little cross-referencing there is between the genealogy and the history literature, in either direction.)

Various historians have I gather been using computers for record linking, in a pretty big way, for some time - over twenty years in some cases. In doing so they have provided themselves with actual algorithms or heuristics for doing the linking, more or less automatically. Typically they apply their techniques to an entire set of, say, parish records for one, or a small group of parishes, and try to establish all the "provable" family links. However since in most cases they are doing this in order to undertake demographic studies, they are often satisfied with a record linking scheme which does not introduce bias into the resulting statistics, rather than seek one that tries to ensure that no false links are made, which is what a genealogist would aim for. But even in this field, the descriptions of, and the justifications given for, the linking techniques used are often ones that as a computer scientist I find somewhat lacking.

To date, of the dozen or so references on record-linking that I have examined, the most satisfactory that I have found is a quite early one by Skolnick: "The Resolution of Ambiguities in Record Linkage," in Identifying People in the Past, ed. E. A. Wrigley, pp.102-127, London, Edward Arnold, 1973. The method Skolnick describes is based on the estimation of maximum likelihoods (e.g taking account of the relative frequency of different surnames in a given population), and was being applied to the linking of records from medieval Italian parish registers, for purposes of genetic research. However in praising Skolnick, I am still aware that he makes various statistical assumptions, by no means all of which are adequately spelled-out. Moreover I am not claiming that his method is a good computer-based record-linking technique in practice, just that in my opinion the description of his method comes closest to providing, albeit implicitly, a rigorous definition of what might constitute genealogical "proof".

I would be most interested to receive comments - and pointers to any literature, by either genealogists or historians - on this issue.

Cheers

Brian Randell

        

Home


Brian.Randell@newcastle.ac.uk 15 Jan 1995