Subscribe to DSC Newsletter

Hamburger Data Quality- data collection is the start

Author: James Standen

I have noticed that when I go to a fast food outlet no matter what I get to drink with my meal it is almost always listed as "Cola" on the receipt. But I didn't order Cola. Ever. Usually I get juice, or milk. So every time I order a burger, I'm clearly a source of bad quality data.

I have looked over the counter on many occasions while I was waiting for my burger and watched the server key in other peoples orders; their fingers flew accross the key pads, but only ever hit the cola key (always in the more central location it seemed). I could actually see the extra wear on the surface of the touch pad. I suspect that the number one reason for keypad replacement in the fast food industry is "cola key not working". I am guessing that employees understand that speed is important, it is fast food after all. I wonder how much data quality is discussed.

Now, this is hardly a scientific study, and falls clearly in the "anecdotal evidence" column, but when I see this it strikes me that somewhere a data warehouse is probably capturing my drink, and its being ignored because all the analysts know that it's always cola.

Or, perhaps, there is a complicated ETL job that took hundreds of hours of expensive consulting time to write that cross feeds drink information from the inventory system that tracks the different quantities of syrups required by each location and then estimates the drinks sold randomly allocating that percentage across the number of meals sold.

If this was done you would not have good information about who drank what with what- is orange soda or milk more popular with the cheese burger? Are the fancy fruit drinks (which have a lower margin) more likely to be ordered by people getting the spicy wrap or the regular? What is the real margin on each meal taking into account the drink?

Or maybe the drink dimension is a special dimension that only shows drink categories at a summarized level because thats the granularity the inventory system uses.

Messy. Reduces the value of the information, hard to explain to the end users. But what can you do if you don't collect the data at the level of each individual order?

Of course, the point is not that I think this wrong drink keyed in issue is an important one for the fast food industry. The point is that if the information at the point of capture is wrong, we can spend a lot of extra effort in the extract transform and load (ETL) logic than we need to with little or no result.

In fact, if we spend enough time on the ETL to make the final data warehouse data appear to be telling us something, it might even be damaging, since the ETL itself might be generating patterns that don't exist, and will lead analysts down dead ends, forever chasing the apparent relationship between Dr. Pepper and curly fries.

And like many issues in business intelligence, and data quality in general, the root problem is one of process and people. Here is what I think the problem is; it is harder to have a thousand data entry people be careful about their data capture than to hire one ETL developer to write some crazy twenty thousand dollar chunk of code. The result of this is that instead of fixing the problem at its source- in this case right at the point of order, we try to fix it in the data base, after the fact.

We need to get the people on the ground who actually experience the event to be motivated to get the data right, right from the start.

Obviously this is true for retail, it's true for the loading dock, it's true for the order desk, it's true even for self serve and on line processes where the data is coming directly from the customer. It's true for all data. Get it right as soon as it goes in, and you've won a big part of the battle.

In my experience, often the key is to only ask for the information you really want, and when you do ask for it, make it clear that it must be accurate, and put in place closed loop processes that ensure it is. Syrup purchases don't match the 100% cola data? Ask why. Include data accuracy as part of the store supervisors assessment criteria. Obviously, the more the process can be automated with bar codes, radio frequency ID tags (RFID) or other technologies, the better.

Data quality starts on the ground. The further from the ground, and the deeper into various operational systems, ETL jobs, staging tables, data warehouses or data marts we try to fix the problem, the harder it will be.

Source: www.datamartist.com/data-quality-get-it-right-at-the-source

Views: 213

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Laura Scarlett on October 12, 2009 at 3:27am
Yes - this issue will never go away I'm afraid. Reminds me of an old anecdote that I heard when I was working at American Express - one of their merchants is One & Only Hotels - very, very upscale beach resorts. When One & Only analysed their guest information the most surprising insight was that over 60% of their guests were from Afganistan, at the time, the 7th poorest nation in the world. Very puzzling until a simple receptionist let slip that Afganistan was at the top of the drop-down menu to be completed when guests were checking in. As the receptionists were continually beaten up by Operations to make check-in quicker and quicker - well there was a short-cut easy to take. Data quality goes directly back to management - making sure these low-paid front-line staff are enabled and trained properly and don't get conflicting or contradictory targets!
Comment by Harsha Pannu on October 10, 2009 at 1:27pm
Very interesting observation, Vincent. I believe in this case the price of all the drinks (colas/non-colas and juices would be equal due to which the customers have not taken it up with the Staff. From the data perspective it is incorrect and it does not depict the real product movement from the store too. However, if we try to look at the wholistic picture that the cola is positioned so well in the minds of people (staff/others who haven't taken note of it) that drink/thirst => cola.

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service