Official documents provide a wealth of information, but the sheer number and diversity of them can make it problematic to obtain exactly the information that you need. Dandy Booksellers has been supplying official publications to libraries, universities and local government, among other bodies, for more than 30 years and so we are aware of the difficulties that researchers can face. As a result we decided some time ago to offer our customers an easy and comprehensive means of querying the vast store of official documents and getting results in a meaningful format.
Once upon a time Dandy Booksellers was simply that – a seller of books. We would sell printed copies of Hansard, Bills, House of Commons / Lords Papers and Command Papers. Our main customers were academic and public libraries. In 2006 we launched Public Information Online (PIO) so that we could offer electronically everything that we sold in paper format via an online subscription. Most customers were keen to transition from print to electronic supply as the price was significantly lower and the customers could save space and time not having to handle the paper copies.
Having the data stored electronically meant that end users could also search across the whole database for subjects of interest and automatically be directed to the appropriate publication, rather than having to first determine which publication was applicable to them and then reading through this to find the required information. This constituted a major time saver for the end user.
New publications were, of course, available in electronic format when they were produced, but to produce an archive of earlier material required digitising publications that were only available in printed form. In 2009 we began a major digitisation project. Customers who were keen to switch to a digital-first approach generously made sections of their hard copy collections available to us and their continued co-operation in doing so has made it possible to build a large and growing archive. Our first phase was to digitise all the House of Lords Papers and Bills going back to 1901, moving on to bound volumes of Hansard for both Houses going back to 1801 and the Standing Committee Debates from when they commenced in 1919.
Digitising these historical documents was clearly a daunting process. The effects of age on the printed page, and in some cases the initial quality of the printing itself, presented a challenge. We were also acutely aware that the exact wording of the documents was of crucial importance, so the typical accuracy of optical character recognition (OCR) at the time was simply insufficient. We also knew that we would have to produce print-capable files that could generate accurate facsimiles of the original documents, albeit ones that were cleaner and easier to read.
The procedure that we painstakingly developed has been able to meet these requirements, but however good your automated processes are, there will always be individual pages where the ravages of time cause unanticipated problems. As a result, every single page that we produce is examined by our digitisation team before being passed to allow us to catch and correct any such problems.
The process produces a hybrid database, with the page content stored using Apache SOLR and the metadata in Microsoft SQL Server, plus high-quality PDF files that can be printed to recreate the original documents as they were first published. These include full colour maps and diagrams, where the colour coding is essential to their use. For example, Figure 1 is a colour plate from the 1958 paper CMND 331 illustrating the report on the explosion at Chanters Colliery, Lancashire.
The existence of this comprehensive database has allowed us to expand the range of what we can offer in response to customer needs. For example, when Parliament said it was no longer going to produce printed copies of the Hansard Bound Volumes we were asked whether we could set up a research tool to help with Pepper v Hart type enquiries, and now we offer all the relevant parliamentary documents to do this.
Public Information Online continues to expand its scope in response to customer feedback. Our current digitisation project, in partnership with Middle Temple, is to scan all Command Papers back to 1870. We are also increasing the scope of our coverage of Statutory Instruments (back to 1987), so that their passage can be tracked in the same way a Bill can currently be tracked on its progress through both Houses.
Soon we will launch our partnership with HeinOnline. This will make all of our data available not only on our PIO platform but also on HeinOnline too – we believe this will open us up to global markets and we expect our customer base to grow.
Public Information Online currently contains a comprehensive archive of official documents from Westminster and the devolved administrations, amounting to over 300,000 documents, including:
• House of Commons Papers from 2006–07
• House of Lords Papers from 1901
• Command Papers from 1900
• Bills, Bill Amendments and Explanatory Notes from 1901/1919
• Public General Acts and Explanatory Notes from 1900/1999
• Public Bill and Standing Committee Debates from 1919
• Hansard bound volumes (House of Commons and House of Lords) from 1803
• Church Measures from 1920
• Scottish Parliament Papers from Session 3
• Scottish Bills and Acts from Session 3
• Scottish Official Reports (Meetings of parliament and Committee Debates) from Session 3
• Northern Ireland Assembly papers from 2007
• Northern Ireland Bills and Amendments from 2007
• Northern Ireland Acts and Explanatory Notes from 2000
• Northern Ireland Hansard from 2008
• Welsh Senedd Acts and Measures (including Explanatory Notes) from 2008
• Welsh Bills, Amendments and Committee Reports from 2011
It also includes key non-Parliamentary and Scottish Government publications, including a broad range of titles from the Office for National Statistics and government departments, such as:
• Army List 1969–2014
• Air Force List 1970–2013
• Annual Abstract of Statistics 1935–2018
• British Imperial Calendar 1939–1973
• Civil Service Yearbook 1974–2018 (access to the database to date)
• HMSO Annual Catalogues 1922–1995
• International Agency Catalogues 1986–1995
• Navy List 1974–2014
• Regional Trends 1974 (No. 1) – 2011 (No. 43)
• Social Trends 1970 (No. 1) – 2011 (No. 41)
• United Kingdom Balance of Payments: Pink Book 2001
• United Kingdom National Accounts: The Blue Book 2000 (1973–2000)
Public Information Online is also the home of the Civil Service Yearbook, which has been online-only since 2018, and contains details of ministerial and senior civil service office holders, with contact details where these are available, from the central and devolved administrations, executive agencies and non-departmental public bodies. Earlier editions of the Civil Service Yearbook, and its predecessor, the British Imperial Calendar, dating back to 1939, have been digitised and are included in the PIO archive.
All sections of PIO are updated daily, with most documents available on the day of publication. It has a simple but powerful search engine that can be run across the whole database, or restricted by, for example, Parliament or date to provide a more focused search. It is also integrated into popular search and delivery solutions such as Ex Libris (Clarivate) and Ebsco Discovery Service.
Alternatively, you can use the browsing facilities to get an overview of all activity in a Parliamentary session, or to follow the passage of a Bill from White Paper to Public Act and even beyond to see post-legislative scrutiny and amending Statutory Instruments from 1987 to date.
The Pepper v Hart facility allows you to search for individual clauses or schedules of interest and to follow their discussion right through the legislative process. You can then produce a combined document that includes all the relevant references that you have chosen.
The uses for PIO are only limited by your imagination, and if there are other facilities that you would like to see included then please let us know, whether or not you are an existing customer, and we will see if these can be integrated.
But the best way to see what PIO has to offer is to run through a few typical examples.
Suppose that you are interested in reports about knife crime. You could enter this as a search term in the Title, ISBN, Paper Number field. This returns a list of papers with the words ‘knife crime’ in the title. By default, this is sorted by relevance, although other options, such as chronological or alphabetical, are also available.
If the search returns a large number of papers, you can narrow the scope by Parliament, category, corporate author, date range and so on by using the options on the left of the page. If you know the ISBN or paper number of a particular publication of interest, you could instead search by these in the same place.
Once you have found a paper of interest, just click on the title to open it. The paper viewer window provides you with a number of tabbed options. The Search Original tab lets you read through the entire document in your browser. This includes a search option so that you can jump straight to a section of interest.
If you prefer to read offline, the Download tab lets you download a high-resolution PDF copy of the paper to read in your preferred software.
And the Related Papers tab shows a list of other papers related to the one under consideration.
You will notice from the above there is also “Home Affairs Committee 7th Report. Knife Crime Volume 2: Oral and written evidence” – PIO is the only place where users will find written and oral arguments brought together in consolidated PDFs.
Other tabbed options may be available depending upon the type of paper selected. For example, if you are considering a Bill, the Associated Papers tab will allow you to track the passage of a bill from announcement to enactment, including all versions of the Bill, proposed amendments, explanatory notes and Hansard reports of readings.
Figure 7 is the start of the Online Safety Bill results as at 21 September 2023. The full list goes right back to the Draft Bill and its explanatory notes on 12 May 2021.
Our Pepper v Hart service offers a refinement of this facility which is designed to help customers find Hansard references where specific clauses are debated. The basic Pepper v Hart page for the Online Safety Bill is shown in Figure 8.
Figure 8 highlights the type of document being listed and colour codes this to show whether it is from the Lords (red) or Commons (green). You can restrict this list by type of paper using the Category option on the left. For example, if you are only interested in Public Bill Committee stages you can restrict it as shown in Figure 9.
Or to focus on the discussion of a specific part of the Bill, enter the search term in the Text In Body field on the left.
Figure 10 gives you a list of all papers that contain the search phrase. Furthermore, it allows you to produce a single document that contains all of the relevant pages from all of the papers listed. You can, again, restrict by category of document, or use the selection check boxes to include or exclude individual papers.
This combined document can be further modified by including pages before or after the ones on which the text match appears, and by removing any unnecessary pages. The final document can be downloaded as a PDF, which can be printed to provide a high-quality facsimile of the pages as they were originally printed in the source documents.
General text searching within all documents is also offered in the standard version of PIO, although in this case there is no facility to combine pages from multiple documents. The general text search supports wildcards, Boolean operators, proximity searching, fuzzy searches and grouping.
We believe one of the main benefits of PIO is that it enables customers to receive alerts to ensure they are notified whenever something is published: we have email alerts, immediate, daily or weekly or RSS feeds. An ‘all fields’ alert can be set up for whatever you need to track – this is invaluable to today's busy researcher.
All you have to do is set up your alert and then rest assured that PIO will keep you up to date. Figure 13 is an example of an email alert for Hansard.
Subscription to PIO grants you unlimited access (subject to our fair use policy) to the full database. Some optional elements, such as the Pepper v Hart service, may require an additional subscription. We support multi-user access and both on-site and off-site use. Login can be by IP address, user name and password, Shibboleth, or via the end user's library card.
We have tried to make the PIO interface as easy to follow as possible, but training is available to help your organisation get the most out of your subscription. In some cases, your users may simply be unaware of the facilities offered, or you may have specific search requirements that would benefit from being explained more fully. Several of our subscribers organise regular online sessions for their staff.
Non-subscribers can make use of the basic search functionality of the site to purchase printed copies via our PIO Shop. https://www.publicinformationonline.com/shop
We hope that you have been able to see that PIO offers a convenient way of accessing and searching a wide range of official documents. Data is only really valuable when it can be turned into information, which is what we believe PIO achieves, and from there into knowledge in the minds of your users.
If you have any comments, or would like to learn more, then we would love to hear from you. You can contact us on [email protected], or [email protected] for technical queries, or follow our Twitter feed at @DBooksellers