Data Engineers vs Scientists vs Analysts
During my time in the data field, I have had several titles, but they have all come down to three jobs: data science, data engineering, and data analysis. If you are looking for a job in the current market, most companies want a data scientist, but they usually need a data analyst and a data engineer. Each job has different career paths; companies and employees must understand what they want and where to be before hiring a title.
Data Analyst - A data analyst is a professional who collects, processes and uses statistical and business math on large datasets to uncover trends, patterns, and insights that can inform decision-making within an organization. They use various tools and techniques to interpret data and provide actionable recommendations to stakeholders. In my other blog posts, I write about the importance of having your data ready before you can begin modeling. It is a necessity to have a solid foundation in descriptive analytics. A data analyst usually demands a lower salary than a data scientist because there is generally less schooling. Still, the ROI of an exemplary data analyst can be higher than hiring a data scientist. Data analysts typically focus on current business needs and help steer the ship on the right course. Solid data analysts have a career path that can lead them into finance leadership, like CFO, or operations, like a COO or CEO.
Data Scientist - When I first graduated from college, data science was the sexy job everyone wanted to pursue. The joke was that a data scientist was a data analyst who lived in California. Since then, I have learned that a data scientist has different skills and career paths. A data scientist is a professional who uses advanced analytical, statistical, and programming skills to interpret complex data and extract valuable insights. They combine aspects of mathematics, statistics, computer science, and domain expertise to analyze large datasets and develop models that can predict trends, identify patterns, and inform strategic decisions. A stereotypical view of data scientists is that they are super bright but need help communicating with leadership. Many data scientists become highly paid individual contributors or lead BI teams. If you are on LinkedIn looking for jobs, most companies post that they are looking for a director of Data Science, but be careful when applying for those jobs. Most want you to build all the models or are just looking for a good analyst. These companies don't know the difference!
Data Engineer - A data engineer is a professional responsible for designing, building, and maintaining the infrastructure that allows data to be collected, stored, and accessed efficiently. They ensure that data pipelines are robust and scalable, enabling data analysts, scientists, and other stakeholders to access and use data effectively. A solid data engineer should be a company's first hire when starting their data environment. Data engineers come in two flavors: IT and business. A sound IT engineer can set up pipelines to APIs and create automated tables. A sound business engineer understands the business and efficiently organizes the data for all analysts and scientists. There are skill sets that one person can own, but realistically, you should be looking for two people who lean into each side. You don't want your business to see slowness because the engineer is validating automation daily, and you don't need development to slow down because of business ad hoc requests. The career path for a data engineer would be Chief Data or Information Officer.
Responsibilities
Data Collection involves gathering data from various sources, such as databases, surveys, and online systems. Your engineers will typically do this but under the guidance of scientists and analysts. Analysts usually seek internal data such as product, labor, or manufacturing, while scientists seek external data such as Federal Reserve or weather data.
Data Cleaning involves ensuring the data is accurate, complete, and free from errors or inconsistencies. This is where the meme of Spider-Man pointing at himself comes into play. Ultimately, the data engineer is responsible for ensuring the data is complete and accurate. However, a good analyst or scientist knows that data validation is the initial step in a project.
Reporting involves creating visualizations, dashboards, and reports to present findings to stakeholders clearly and understandably. Analysts are usually the ones in charge of reporting. If analysts have SQL skills, they can do this all by themselves. Most companies need engineers to build views or tables that make reporting automated and efficient.
Performance Measurement involves evaluating the effectiveness of strategies and campaigns by analyzing relevant metrics. This is where analysts and scientists start their turf war. Analysts are good at looking at pre/post with test/control or A/B testing. Scientists can use advanced statistical analysis to do causality measurements or create artificial customer bases to measure against. I was pitched by a company that made a virtual America with a model person for every person in America. They would make predictions on how campaigns would work based on modeled impact. That is cool!
Predictive Analytics uses historical data to make forecasts and predict future trends. This will primarily be your scientists. Analysts will want to do this, but you need a firm grasp of statistics and machine learning models to get the most out of it.
Data Architecture Design involves designing the architecture of data systems and databases to support data collection, storage, and retrieval. This is the responsibility of engineers.
Data Pipeline Development is concerned with creating and managing data pipelines that automate the process of extracting, transforming, and loading (ETL) data from various sources to data warehouses or lakes. Engineers are responsible for this.
Database Management involves managing and optimizing databases and ensuring they are scalable, secure, and efficient. Engineers are responsible for this.
Data Integration involves collecting data from different sources and creating a unified data view. Engineers are responsible for this.
Performance Optimization is part of an engineer's task. Engineers are responsible for ensuring that data systems are optimized for performance and scalability and efficiently handle large volumes of data.
Data Security involves implementing measures to protect sensitive information and ensure compliance with data protection regulations. Although this seems like a responsibility for engineers on paper, it is a huge potential area for mismanagement by all.
Model Evaluation and Tuning involve assessing the performance of models using various metrics and refining them to improve accuracy and efficiency. Building models is never a one-and-done thing. Scientists must continue monitoring their models and update them with new knowledge. Working with analysts who are in tune with the field can be essential.
Exploratory Data Analysis (EDA) involves analyzing data to understand its structure, detect patterns, and formulate hypotheses. This is a 101-level skill for scientists. They can spend days looking at the data and finding what they can and can only use in situations before building a model.
Skills
Statistical Knowledge: Understanding of statistical methods and their applications. This will primarily be scientists.
Technical Proficiency: Familiarity with data analysis tools and software such as SQL, Excel, Python, R, Scala, Java, and data visualization tools like Tableau or Power BI. Analysts need to be good in SQL, Excel, and data visualization. Scientists will be proficient in SQL and Python, and R. Engineers are an absolute must with SQL and Python.
Analytical Thinking: Ability to think critically and analytically to solve complex problems. Everyone should have this skill.
Attention to Detail: Ensuring accuracy in data handling and analysis. Everyone should have this skill.
Communication: Analysts need strong written and verbal communication skills to present findings effectively. This is where they need to shine. They can only succeed if they can get leadership to understand their analysis. Scientists tend to struggle here, and having a good project manager to work with them on presentations can be critical.
Problem-Solving: Aptitude for identifying issues and developing data-driven solutions. Everyone should be strong here.
Business Acumen: Understanding business context and how data insights can drive strategic decisions. Your analysts and scientists need to spend time understanding the business before you ever expect them to do work. It is an excellent idea to have your engineers spend time understanding the company so that what they build is done in a future-proof way and that analysts can figure out how to join things up.
Data Mining: Discovering patterns, correlations, and valuable information from large data sets using various techniques from statistics, machine learning, and database systems. It involves extracting meaningful insights and knowledge to help organizations make informed decisions, predict future trends, and understand relationships within their data. While analysts could benefit from this, data scientists usually use data for new things.
Machine Learning: Implementing algorithms to predict outcomes based on data. You need strong statistics skills to do this; scientists usually have that experience.
Database Management: Using database systems to store and retrieve data efficiently. This is primarily the realm of engineers, but scientists usually create their own data sets to test things, so it is good for them to know.
Data visualization: Creating graphical representations of data to communicate insights. It is both an art and a science and should be a skill of every analyst and scientist.
Data Warehousing: Experience with data warehousing solutions like Amazon Redshift, Google BigQuery, or Snowflake. This is a skill you need for an engineer.
In summary, each profession is crucial in helping organizations make informed decisions by transforming raw data into meaningful insights. They work together to build and maintain the infrastructure that supports data-driven decision-making. Each group blends technical skills with analytical thinking to support data-driven strategies. Refrain from searching for a unicorn that can consolidate all these skills and responsibilities. Realize that you will need a few different people to help build your data culture.