Data Engineering
A quick overview of data engineering. What it is, why it matters and what they do.
What Is Data Engineering?
Data Engineering is the discipline that sits in between software engineering and data. It requires both skills sets. Highly knowlegeable about coding, and highly knowlegeable about data.
The data people are heavily knowledgable about data and know a bit about code. The coders know a lot about code, and a bit about data. Data engineers need to be highly competent in both.
Why Does Data Engineering Matter?
It matters because it helps both sides (software and data) to both build better products.
For example, your favorite steaming service recommends show it thinks you would like based on what you have previously watched. This is called a recommendation system. The data engineer gets the data from the app database cleans and manipulates it so it is availabel to the data scientiss who train and fine-tune the machine learning model. Then the software engineers implement model that you use without even realizing it.
What Does a Data Engineer Do?
A data engineer moves data from one place to another and improves its quality. They build automated systems called data pipelines that move data, build data warehouses, clean data to improve its quality and are involved with data standards for an organizaton.
The most succinct way to describe it is they build the systems to extract data from a varieity of upstream sources, manipulate the data and delilver it to a variety of downstream recipients.
I have found many instances where issues with apps are not the code, but the data.
How Does a Data Engineer Fit Into An Organization?
There are four disciplines that all need to work together. These are software engineers, data engineers, data analtysts and data scientists.
The data engineer is the middle man of this. They don't build the end product, but what they do is essential for the end product. They like the truck driver that takes a product from the factory to the reatil store so you can buy it.
Depending upon the organizaton it could be highly specialized or part of a hybrid role that involves multiple. In general, the larger the organizaton the more specialized someone can be. Smaller organizatons need people to wear multiple hats and do both software and data engineering or data engineering and data analysis / data science.
I am currently the first of the hybrids. I do both software engineering and data engineering. So at times I build automated data pipelines and other times I build apps.
Tech Stack
Data engineers use much of the same technology as do software engineers and pure data folks(analysts and scientists) with some differences. These include SQL, programming languages(Python and R), ML libraries, data warehouses, automation tools, data orchestraton, visualization, cloud infrastructure and big data(if needed, only the biggest organizations actually need big data). The specific tools someone uses really depends upon their specific role at their company.