Book contents
Summary
We’ve covered a lot in this book, so it is helpful to summarise and pull together the main points.
I hope you’ve enjoyed reading this book and have been able to take something away from it to improve your data quality. I’ve certainly enjoyed writing it – a lot more than I thought I would! – and I have really surprised myself with how much I can actually write about data. There's another tip for you: even if you think you can't – try! You’ll be amazed at what you’re capable of.
The same goes with data normalisation, classification and cleansing. You might think it's a mammoth task and that it's too big to deal with, but you can start small by focusing on the high value spend or customers or the most frequently used data and work from there.
Dirty data
I’ve shared with you the types of dirty data you might find, and of course, there will be some specific examples related to your organisation. But in general, it will be things like: misspelt names, incorrect or misleading descriptions, missing or incorrect codes, no standard formatting of addresses or units of measure, currency issues, incorrect or partially classified spend data, and not forgetting the most popular, the duplicates.
In terms of the consequences of dirty data, I’ve given examples of what could go wrong and what has gone wrong for some of my clients in the past. Fear not though, you now have spot-checking tips for your data toolbelt.
COAT
What goes better with your data toolbelt than your data COAT? Your data needs to be consistent, organised, accurate and trustworthy and you have to have all four as they are all interdependent. Make COAT fun so that you can engage non-data people within your organisation in a meaningful way and relate it to any data situation.
Normalisation
When you’re ready to add to your data toolbelt, you can start to normalise your suppliers. The key message here is ‘rinse and repeat’ – don't just normalise once and think that's it. As you are cleaning up the data, you can create new near-duplicates that need to be tidied up, so go back and check a second or third time.
- Type
- Chapter
- Information
- Between the SpreadsheetsClassifying and Fixing Dirty Data, pp. 151 - 154Publisher: FacetPrint publication year: 2021