Have you ever thought about what is data engineering? Here are my thoughts as the data architect who works in data and analytics within a performance marketing company. I will cover some of the core aspects of data engineering starting with what the role is, how you can get into data engineering, and then end with some topics on data engineering skills and tips.
What Is Data Engineering
Data scientist will sometimes want to paint a picture that tells a story. The data engineer is the person who helps supply the colors neccessary to paint that picture at scale and then helps create the frame that holds it on the wall.
If you think about that analogy, you can truly understand what the data engineer truly is. They are the professionals who are helping build and manage the data infrastructure that supports the enterprise data plan.
When it’s all said and done. The data engineer is a good mixture of disciplines that include computer science, database development, and information technology. They work in both code and infrastructure, both science and engineering, and both data processing and data security.
Where one discipline may end, they often begin.
How to become a Data Engineer
If I had to become a data engineer tomorrow with limited knowledge and no experience, I would start with research. I would try to understand all the opinions–including my own–of what the data engineer is and does. This would include searching online and checking out books that may cover the topic well.
The next area of focus would likely be applying the knowledge gained from that research on the side. I would start new projects that included the different expertise and skillset needed to become a data engineer. This would include learning new languages like Python, learning new tools like those for ETL process, and learning how to take advantage of the popular cloud computing services like Amazon Redshift.
I would likely turn these projects into a collection of my best work. This collection would become my data engineering portfolio that I could use to help sell me to potential businesses. I would try to work and evolve these data projects, try to solve real-world problems, and really proclaim my passion for becoming a data engineer.
If that still wasn’t enough, the best option that has always worked for me is just getting in the door. I would start small like finding a job I could get hired for within an organization that could employee me as a data engineer eventually. This would allow me to gain the domain experience needed and potentially the chance to work indirectly with related fields. Then I would use that to my advantage to transition into the role I wanted.
Data Engineering Skills
There are a vast number of skills to consider with data engineering roles. The ones off the topic of head include:
- Advanced Analytics (Forecasting, Machine Learning, etc)
- Big Data Platforms (NoSQL, Data Streaming, etc)
- Databases (RDBMS)
- Data Warehousing & Data Marts (Kimball versus Imnar)
- ETL Development & Processing (SSIS, Informatica, etc)
- Cloud Computing (Amazon Web Services, Azure, IBM)
- Programming/Scripting Languages (SQL, Python, R, Java, Scala, etc)
- Visualization (Excel, Tableau, Matplotlib, Seaborn, etc)
Sub areas may include:
Then of course focusing on a particular domain goes a long way. I chose digital marketing and video games.
Big Data Engineering
I would do you wrong if I did not talk about big data engineering. This is the most common relationship data engineers have when they are out in the field. They are commonly tagged as being specialized in only big data platforms. This simply is not true.
While it’s true, there are a lot of jobs that are labeled as the big data engineer. Big data is still a very vague topic that really has more marketing value than real value for most organizations. Data engineers are deployed to solve problems.
Sometimes that problem involves an issue that is tied to what many refer to as big data problems. Whatever the case, big or small, data engineers can work anywhere that takes their data seriously. Don’t fall into the marketing hype that is big data.
Data and Analytics
Most of the data engineers I know work pretty closely with analytical teams. They focus on filling the technology void within the data and analytics teams that are tasked with various data projects. While there may be other technology roles in the organization, data engineers are often times assigned as dedicated roles with data scientist.
They are tasked with solving complex data infrastructure and processing problems that data and analytics teams face. This is commonly where the big data connection comes from the most with data engineering. They often faced with those so-called big data problems that may involve better ways to automate, clean, and computate large sets of data.
Focus On Solving Problems
The best data engineers in my experience are the ones who hold true what being an engineer is all about. They focus on using what’s available to them because they understand not every organization has the same budget or resources. This means you may be faced with times where you cannot invest in the next bleeding technology some marketing guy is selling you.
Focus on solving problems with what you have. If you’re a Microsoft shop and have invested in SQL Server, then focus on leveraging what you have before you start pitching some hot new scalable DBaaS solution that requires major overhauls.
If change is needed that involves major infrastructural or system wide change, then don’t be knee-jerk about that change. Research, test, plan, and then pitch. Exhaust all your options before making a decision on change and make damn sure it’s the right choice for the business before you deploy.
Work As A Cohesive Team
If you manage to work as a data engineer, you will likely run into similar positions that work as closely to the same areas you do. This is not uncommon, especially in large organizations that have database administrators, business intelligence developers, SQL developers and so on.
Don’t Silo Yourself
You may find that the only thing that separates you is title and the team or project you work on. Instead of trying to silo yourself from other data professionals, aim to work in collaboration, even across teams and projects. You should not go into this alone and there is normally a wealth of knowledge you can benefit from by involving your data colleagues.
Disparity Is Bad
The one thing to focus on that you should avoid is creating further disparity within your organization. This can be anything from disparate data infrastructures that don’t align with the enterprise data strategy to overstepping your boundaries with existing data infrastructures and systems that were not built by you or your team.
Bridge The Gap
It’s critical to understand that data engineering and the concepts around what you do are not entirely new. Learn to bridge the gap, work together, and build great things that push the business forward.
I don’t think I’m particular the most avid expert in this field. I don’t even particularly think I’m good at what I do. But one thing I do know is the people I work with and the business itself has put a lot of trust in me as a person.
While I’m not the smartest person in the room, I try to focus on looking at the big picture and solving problems. Teams rely on me to help them through that problem solving process that often relates to data. That’s the value I add and the return on my investment is learning new things, experience, and doing something I’m truly passionate about–data.