Sunday 27 September 2015

When the Digital Age Hinders



The digitisation of historical documents is truly a great benefit to genealogists and to historians. Without it then we would have to travel a lot more — demonstrating more than a little commitment to our chosen field — and spend longer trying to find that essential bit of information for our quest. But are there instances where it works against us?


Newspaper archives are one of the most incredible of these resources, and I use several of them. The British Newspaper Archive (BNA) is a partnership between the British Library and Findmypast to digitise up to 40 million newspaper pages from the British Library's vast collection. When it was launched in 2011, I chose to access the resource via my existing Findmypast subscription rather than take out an independent subscription to the BNA, but my resulting love-hate relationship with their interface has been smouldering ever since.

The other day, I found the need to search for references to a “Miss Jesson” (as she was always known in print) in Nottinghamshire newspapers from the period 1850–1899. This was because Mary Jesson was an ancestor who worked as a costume maker in the theatre, and the Nottingham Theatre Royal celebrated its 150th anniversary on 25 Sep 2015. Although I’d written about her before, in A Rich French Actor, my goal on this occasion was to see if I could place her at the theatre when it first opened, on 25 Sep 1865.

I entered the criteria and it came up with 10 hits, including 3 false positives from the 1850s — referring to a different person — and 5 hits from the period 1881–1882 that I already knew about. This left the following two hits from 1865 and 1894:


THIS DAY'S RACING.   “… We wUI only 'add that it is excellent throughout— in fact, superlatively so. The dresses reflect very great credit on Miss Jesson, and Miss Collier is to be congratulated on most of the dances she has arranged. The music has been neatly and appropriately …”

27 July 1894 - Nottingham Evening Post - Nottingham, Nottinghamshire

NOTTS. MICHAELMAS QUARTER.SESSIONS.  “… arranged by J. Ketltno. The Lime Lights by W. Marriott. The Gas arrangements by W. Watchorn. The magnificent dresses by Miss Jesson. The whole produced under the immediate and personal superintendence of Mr. Thomas W. Charles. Doors open at 7; to commence …”

20 October 1865 - Nottinghamshire Guardian - Nottingham, Nottinghamshire


These were really interesting because: (a) 1865 was the date that the Nottingham theatre opened, and (b) by 1894 she had moved to a theatre in London. Hence, these two hits had the prospect of telling me something that I didn’t know, and maybe even overturning what I thought I did know. I was full of excitement!

However, I looked at these editions, and looked again, but could I find those transcribed extracts? …. could I heck! You see, those titles were there but the extracts were not. Problem 1: Findmypast has never highlighted the hits when viewing the associated pages and that can mean having to read a whole broadsheet page of newspaper print, and occasionally more, searching with a fine toothcomb for the words. I admit to having abandoned a number of previous searches during my research, simply because I could not find the alleged references. They have been aware of the problem ever since they hosted the BNA data, but how a major enterprise like that could be conceived without providing for this feature in the design escapes me; other archives provide it.

Eventually, I found that both of these newspaper editions contained a reference to a “Mr. Jesson”, but still nothing remotely close to the extracts. Problem 2: the BNA search engine insists on relaxing your search criteria, automatically including many similar words for you, but provides no way of overriding that. It you want to be very specific and find only the specified word(s) then it is impossible. In contrast, we would achieve this fundamental requirement in Google by placing something in quotation marks.

This particular search engine is generally very weak, and has not been well-designed. There are no Boolean facilities (specifying OR/AND between words or phrases) and no way of eliminating particular words — problem 3. The latter deficiency is particularly annoying if the search engine includes many words that you didn’t ask for. When Findmypast asked their users for feedback on the changes that they had made last year, there were several related to the newspaper searches. Unfortunately, some of the interpretations of that feedback seemed to be quite obtuse. For instance, on 30 Jan 2014, one such user request read:

Newspaper Searches - exclude unwanted records
Able to exclude unwanted records on Newspaper searches.

This looks quite straightforward to anyone who has used a modern search engine, but the response was:

We’re a little stuck here with the idea of what is unwanted and what isn’t. The difficulty is knowing this automatically. We’re unable as a result to offer this kind of service.

The user’s suggestion was declined and this bizarre response was still visible at the time of writing. How could anyone imagine that a user was requesting the “automatic” exclusion of unwanted records?

I tried to help at this point and on 11 Nov 2014 I posted a lengthy analysis of some 8 separate user requests, suggesting that they were variations of a smaller number of common themes, and relating them to demonstrable problems. That post was inaccessible at the time of writing due to a “This UserVoice subdomain is currently available!” error.

So where did the above extracts come from? This was not easy to determine because including too much of the transcribed text returned no hits and including too little returned too many hits, but there is virtually no control over the search process for the user. For instance, searching for just the phrase "personal superintendence" and the word “theatre”, both from the 1865 extract, resulted in 18 full pages of hits, and some of these included words that I didn’t want such as “person”, “personally”, and “superintended”.

To compound this, the default ‘search by relevance’ does anything but this — problem 4. A case I had presented to Findmypast in 2014 was still evident when I repeated it for this article: searching for Elizabeth Bond in Nottinghamshire newspapers gives 14 hits, but some of these include intermediate words such as “Sarah”, “New”, “Mary”, and “Woolley of”. In particular, the hit for “Elizabeth Woolley of Bond” (Nottinghamshire Guardian, 10 Sep 1857) is presented before one of “Elizabeth Bond” (Nottinghamshire Guardian, 2 Nov 1866).

I eventually determined that the 1894 extract was actually from “CHRISTMAS IN NOTTINGHAM” (Nottinghamshire Guardian, 30 Dec 1881), although the transcribed extract was shown as “… Unknown …” in that specific case. Although I couldn’t find the 1865 extract manually, it did match one of the other original 10 hits: “AMUSEMENTS, THEATRE ROYAL, NOTTINGHAM” (Nottingham Evening Post, 31 Dec 1881). In other words, both of these hits were red herrings and I had spent some considerable time chasing them.

So isn’t this just a case of mis-indexing? Although such errors are rare, they do happen occasionally. Well, no — I have a recollection of reporting the 1865 case a couple of years ago but I have no proof. Problem 5: Findmypast provide no trackable call number that can be revisited to check on the progress of a software bug, transcription error, indexing error, etc. By now, it’s probably hard for me to disguise who the subject was in my previous article: Customer Service. Also, what’s the probability of such an indexing error occurring twice in a single search? Isn’t it indicative of a systemic error?

What I do have proof of is that I reported a similar error on 30 Apr 2015 because I kept a copy of my text in that case. I was searching for references to the name Frank Whiley during 1900–1949, while researching for Like Father, Like Son, and it had yielded the following hit:

COUNTY COUNCIL AND POLICE OFFICER SUED. “…the Rev. J. W. Busby, said afterwards, " I shall never forget to-day's experience." Those who died in the fire were Mr Frank Whiley (52), an unemployed labourer, of Henry Street, Sneinton, Notts; his wife Rose (49); and his two daughters, Lily (14) and …”

17 February 1923 - Gloucester Journal - Gloucester, Gloucestershire

There is an article in that newspaper with the given title, but not that transcribed extract of a funeral following a tragic fire in Nottingham. That text did appear, almost word-for-word, in a number of national newspapers on 30 Dec 1937, but not in the Gloucester Journal, and certainly not in 1923!

In order to round off this outpouring of frustration, I decided to check which national newspapers did include this same, or similar, text. I searched for the phrase: "Busby said Afterwards", with no other filters, and the results were shocking! Eliminating two false positives from 1931 left the following:

CROWDS IN TEARS. “… unable to restrain their tears, almost drowned with sobs the voices of the two* clergymen. One of them. Rev. J. W. Busby, said afterwards. " I shall never forget today's experience." Ten girls, friends of the family, carried Florrie's coffin and acted …”

30 December 1937 - Western Morning News - Plymouth, Devon

SOBBING MOURNERS INTERRUPT FUNERAL SERVICE. “… unable to restrain their tears, almost drowned with sobs the voices of the two clergymen. One of them, the Rev. J. W. Busby, said afterwards: " I shall never forget to-day's experience." Those who died in the fire were Mr Prank Whiley (52) an unemployed labourer …”

30 December 1937 - Western Daily Press - Bristol, Bristol

BOROUGH PETTY SESSIONS. “… unable to restrain their tears, almost drowned with sobs the voices of the two clergy men. One of them, the Rev. J. W. Busby, said afterwards, " I shall never forget to-day's experience." Those who died in the fire were Mr Frank Whiley (52), an unemployed …”

09 June 1888 - Northampton Mercury - Northampton, Northamptonshire

CROWDS AT FUNERAL OF FIRE VICTIMS. “… Unknown …”

30 December 1937 - Aberdeen Journal - Aberdeen, Aberdeenshire, Scotland

The hit from the Aberdeen Journal, for some unexplained reason, didn’t show a transcribed extract. However, it was a true hit and the relevant extract did show up by using some slightly modified criteria just a few minutes afterwards.

The interesting hit is the 1888 one from the Northampton Mercury, which is another case of an error. The fire wasn’t in 1888, and that “Borough Petty Sessions” article did not contain the alleged extract. While the case I reported last year did not show, a new one that I had not seen before did show.

I can’t believe that I am the only victim here. I thought that I was going to write about just two recent cases that, by some incredible fluke, had appeared in the same search. To realise that I had reported at least one similar case before, and then to encounter yet another case while writing this article, has left me with a complete loss of confidence in this resource. It cannot currently be described as fit-for-purpose with this litany of indexing errors and the weakness of its search engine.

I often record negative searches, or likely-looking hits that I have eliminated, but it looks like I now need to record all the indexing errors in order to avoid wasting my time. How many of my abandoned searches might have fallen into this category without me knowing?

No comments:

Post a Comment