Newspapers are loaded with information, but finding the information can be a challenge.
Unless the newspaper has been microfilmed (and often even if it has been), libraries hold very few copies and these are often not available through interlibrary loan. Even with access to a newspaper, unless the paper has been indexed, searching for the desired information can be a tiresome quest.
Because of their vast amounts of hard-to-access information, local newspapers are often suggested as prime candidates for digitisation. At its best, a digitised newspaper makes all its information searchable and accessible to anyone with internet access.
But newspapers present some distinct challenges on the road to a successful outcome.
Over the past years our team has been scanning thousands of digital images from original bound newspaper pages, as well as microfilm. One benefit of being able to access the original bound volumes of newspapers is that, unlike many other newspaper digitisation projects, we have been able to scan some of the rarest and most fragile newspapers in the collection.
We have even scanned single pages more than two feet wide!
Our scanning team uses a mixture of equipment up to and including A0 scanners that create very high quality digital images (of 1200dpi in 24bit colour, for the technically minded).
Some of the newspapers already scanned have resulted in single page image files being as large as 1.6GB! This is due to the very large physical size of the original newspaper pages, particularly around the turn of the 19th century.
The scanned page images are saved as raster TIFF format for archive purposes and surrogate would be produced in JPEG/PNG format. The image files are also run through an optical character recognition (OCR) process which creates the electronic text.
This process involves segmenting each page into classified zones to help your searching. Finally, the output OCR text is indexed in a large database which be transferred into a searchable database i.e. www.actofunion.ac.uk
As well as the original paper scanning, we have scanned a significant amount from existing microfilm.
The resulting digital image quality is slightly lower than that produced by the paper scanning; however, it does have the benefit of being much faster to digitise.
This allows us to create many more pages for example on a website than would otherwise be possible. The resolution of microfilm sourced images is greyscale at roughly 600dpi-1200dpi.