What is Data Wrangling
Your company knows the importance of data at this point, but it may need help understanding what data is essential. Your company probably has sales feeds from your POS systems, labor reports from your HR vendor, or monthly recap files from a shipping partner. Each source is a valuable puzzle piece, but you can see the whole picture when you bring everything together! Every Excel, PDF, and Word document has the potential to unlock a critical understanding of how your company operates that you don't know.
Imagine watching Yellowstone, and the camera pans to a wide shot where you see wild horses running across the plains. They don't have any direction, but you can see that they collectively work together. Each of them has so much power and potential if it is trained and taken care of. Your data is the same. The isolated data points everywhere can bring so much insight, but you don't know how to harness it. That is where data wranglers come in. I have always wanted to be a cowboy, and I imagine myself riding in, trusty lasso in hand, pulling all that wild data into the barn and breaking it in! Sometimes, I feel silly sitting at my computer coding in SQL with a cowboy hat on, but it makes me feel good. As your data roams the company share drives, unused and untrained; all it has is the potential to be great. Your data needs someone to come in, organize it, centralize it, and train it to get used to it. The proper term for data wrangling is data engineering. Data engineers gather your data, pull it into a central environment, clean it, and normalize it. Then, they create accessible data for end users to understand and join other data.
It is crucial to have a central location that analytics can access. By itself, labor data tells you how much you are spending. But how much should you be spending? What is that peak level of efficiency? Adding more labor will help you close more sales, but there are diminishing returns at some point because your traffic is capped. With too little labor, you leave money on the table as customers walk out because not enough people are on the floor to help them. What if you used customer counts from your sales data and found the optimal point of labor efficiency? Combining these two data sets brings exponential value to each data.
Cleaning and normalizing data is a crucial step in preparation, and many data warehousing companies need to take the time to do it. I have worked with companies that were hired to come in and bring data into a new data warehouse. I recommend having your people with corporate knowledge oversee the project, but outside consulting firms can set up data warehouses and do the initial loads cheaper and faster, which is a good use of the budget. The data warehousing companies will load in all the old files and set up automated pipelines to the data sources you identify early on. They often "lift and shift," meaning they move data from one place to another without additional due diligence. Some nuances can be overlooked in the cleaning and normalizing stage.
Store numbers are stored as alphanumeric values instead of integers. Data will join together, but when you sort your data, the store numbers come out as 1,10,100 instead of 1,2,3. If a store number is missing, a space can be entered in an alphanumeric column, and the analyst may never know they are missing data in their joins. It's a small, simple thing, but those details matter later.
Your sales data is stored as a date, and your labor data is stored as a time stamp. When you go to join these two sources up, they never join because dates have a time stamp of midnight, while the other data has a specific time of day. Have the company create a date field and a timestamp field in both, and make sure they have the same name so your analysts don't get mixed up!
When a field is empty, we call that blank or define it as NULL. Some data warehousing companies will fill it in with 0, or even worse, put it in space, breaking many processes. A data wrangler realizes that empty and 0 are two different things! When analysts do an average across values, 0s will be treated much differently than empty values.
When the time comes for your company to begin ingesting data into a data warehouse, it is vitally important that you have a data engineer hired and trained on the business side. So often, companies see data engineers as an IT hire; however, if you want to do things right, your data engineer needs to be a middleman who speaks IT and business. A business person making decisions on how the data will be set up will not understand what to look for in data tables to make sure they are efficiently helpful in the future. You will waste time in the future having to double back. An IT hire will bring in data but does not know how to use it. Your CEO will want to get ROI on their data investment, and an IT hire will not understand how the data should work together. Think of a good data engineer as a contractor you hire to build your house. They know carpentry, plumbing, and all the areas that must be correct. You don't want to hire someone who has only done sheetrock to oversee the entire project!
The hottest hire in the data world should be a data engineer with a strong BI background. Someone who knows SQL but can also see the forest through the trees is a critical hire for even an established data-driven company. They can work with different departments and help seek out data to fill the gaps. I have worked in environments where the people in charge don't understand the bigger picture and de-prioritize projects that will impact your analytical capability most. Data is a big engine; sometimes, one extra little piece is needed to start. Here are some examples of when we sought out data or had it given to us, which created significant insights.
Credit card customer mapping – Do you have a credit card that you use in person and another credit card that you use for online shopping? Many people do! Some companies will help you match up credit card customers so you know that John Smith, which shops you in-store, is the same John Smith that shops you online. Now, you can track your customer lifecycle and understand how long it takes them to adopt new online delivery or pickup.
Labor Data – Your company will use a payroll system like Ceridian or ADP to pay your employees. You will want to pull that in to track total hours worked and payroll costs in your data journey. If you are a retail company, you can get closer to individual store EBITDA with this data. You also have access to the start dates of your employees, and you began to analyze the impact of tenure on sales performance. A consulting firm can track turnover and see if it is clients or tenure affecting turnover, or you can find an underperforming trainer.
Federal Reserve Data – Most government economic data is accessible. Suppose your business is tied to the housing market. In that case, you can make monthly automatic pulls from the Federal Reserve to get the latest monthly trends. You can pull in the latest cost of living indexes, interest rates, or national credit card usage numbers. While most of these won't affect your business, they can be helpful for your analysts to tie back performance to national trends and see if you are outperforming the current economic environment.
Engineering your data is called data wrangling because it gathers data living in the wild, which does not benefit anyone. You take all this data, bring it into your stable, clean it, and look after it, and in time, it will bring you genuinely remarkable results.