Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-09T08:53:12.719Z Has data issue: false hasContentIssue false

Summary

Published online by Cambridge University Press:  09 November 2021

Get access

Summary

We’ve covered a lot in this book, so it is helpful to summarise and pull together the main points.

I hope you’ve enjoyed reading this book and have been able to take something away from it to improve your data quality. I’ve certainly enjoyed writing it – a lot more than I thought I would! – and I have really surprised myself with how much I can actually write about data. There's another tip for you: even if you think you can't – try! You’ll be amazed at what you’re capable of.

The same goes with data normalisation, classification and cleansing. You might think it's a mammoth task and that it's too big to deal with, but you can start small by focusing on the high value spend or customers or the most frequently used data and work from there.

Dirty data

I’ve shared with you the types of dirty data you might find, and of course, there will be some specific examples related to your organisation. But in general, it will be things like: misspelt names, incorrect or misleading descriptions, missing or incorrect codes, no standard formatting of addresses or units of measure, currency issues, incorrect or partially classified spend data, and not forgetting the most popular, the duplicates.

In terms of the consequences of dirty data, I’ve given examples of what could go wrong and what has gone wrong for some of my clients in the past. Fear not though, you now have spot-checking tips for your data toolbelt.

COAT

What goes better with your data toolbelt than your data COAT? Your data needs to be consistent, organised, accurate and trustworthy and you have to have all four as they are all interdependent. Make COAT fun so that you can engage non-data people within your organisation in a meaningful way and relate it to any data situation.

Normalisation

When you’re ready to add to your data toolbelt, you can start to normalise your suppliers. The key message here is ‘rinse and repeat’ – don't just normalise once and think that's it. As you are cleaning up the data, you can create new near-duplicates that need to be tidied up, so go back and check a second or third time.

Type
Chapter
Information
Between the Spreadsheets
Classifying and Fixing Dirty Data
, pp. 151 - 154
Publisher: Facet
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Summary
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.010
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Summary
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.010
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Summary
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.010
Available formats
×