Data science is splitting up, and that’s going to work in your favor. As companies and organizations begin to understand the complexity of working with big data, “data science” is losing its catch-all definition. One aspect of data science that continues to grow in popularity is data engineering. If you’ve struggled with gaining employment under the general title of data scientist, this article, where we look at how to become a data engineer, might be for you.
[Related article: The Difference Between Data Scientists and Data Engineers]
What is Data Engineering?
Data engineers build the infrastructure data scientists, and analysts need to wrangle big data. The type of massive data companies access requires careful planning to tease out patterns and prevent data swamps.
Data engineering builds on the concepts of computer engineering by providing virtual structures designed to ease issues with data cleaning and ensure data is available at any point. Pipelines ensure the course of data coming into and out of the system are secure and free of bottlenecks.
Data engineers take that off a data scientist’s plate, allowing a data scientist to make decisions about things like appropriate algorithms and visualization. If the structure is in place, and someone else is troubleshooting the system itself, a data scientist can get back to the business of delivering insights.
Although data engineers will probably spend time processing cross-functional tasks, you could further your career by niching into a category:
- Generalist: If your dream is to work on a small team or startup, following the generalist path is your best bet. They know how to do a bit of everything and can set up architectures for a variety of tasks.
- Pipeline centric: If your dream company is an established organization with complex data needs, focusing on building pipelines could be your jam. Pipelines are often part of revenue-producing data projects.
- Database Centric: Large companies with massive data and legacy systems need engineers to form data warehouses. Legacy systems with multiple massive data streams will need to be converted to something workable for a massive, fast data-driven culture.
There’s a huge shortage of qualified data engineers. If you want to take this direction, here’s how to go about it.
From a computer science/coding path
If you’ve already got the computer and coding skills from your undergraduate or graduate degree or you’ve previously attended comprehensive boot camps, it’s all about getting experience. Once you’ve decided your path from the three above, focus on gaining project experience.
- Attend a boot camp specifically for engineering or seek out open courseware if you don’t have a boot camp budget. Specific skills in pipelines, architecting distributed systems and data stores, or combining data sources are all parts of this. Skills such as Scala, Python, and Hadoop are essential, but more important is the underlying concepts.
- You can gain experience from entry-level IT positions or transition from data science to data engineering for a small company that doesn’t need scale quite yet. You can also build systems independently and document them through your online portfolio.
- Gain professional certifications. IBM, Microsoft, Google, and Cloudera, for example, all offer certifications specifically in data engineering. If you know your preferred organization works with a particular set of tools, that can help focus your certifications.
- Consider a graduate degree. Data engineering is highly technical, and just a certification may not be enough to help you stand out. There is a data engineering talent shortage, but companies seem willing to wait for the right one or train internally instead of hiring someone that doesn’t quite fit.
From a noncoding path
If you don’t have a computer science degree, but you have some basic familiarity with computer coding and mathematics, the path is a bit longer. You’ll need to understand the basics of computer science first, so start at the basics. Data engineering is a highly technical, advanced field, so you may need to settle in for the long haul.
- Check-in your area if local boot camps or sprint are available. In my area, for example, the Nashville Software School frequently offers three-week “jumpstarts” to get you started in both development and data science.
- You’ll need at least an intermediate familiarity with advanced skills such as Python, Java/Scala, SQL/NoSQL, cloud platforms and computing, and architecture options like Hadoop. Explore open courseware to help get you there. Look into your local community college or university, too.
- You’ll want to get on board as quickly as possible with a specific project. Finding a passion project, such as this person’s journey with OkCupid, could help jumpstart those real-world skills and keep you interested for the long haul.
- Set up your Github profile and look for hackathons, volunteer opportunities, internships, local meetups, and anything else that can help you gain real-world experience, networking experience, and keep you interested.
- Begin exploring positions in data science or even data engineering at small companies or startups with simple needs. As you gain experience processing and building for business value, you can move on to larger companies with more complex data needs.
[Related article: Why Data Scientists Should Definitely Be Writing for Medium]
Fulfilling the Data Engineer Role
Data engineers are in high demand, but you’ll still need to stand out from the crowd to find a position. As data scientists move deeper into the world of AI and complex algorithms that process the almost infinite amounts of messy data we have, a data engineer provides the environment in which a data scientist can operate.
Consider whether you’re looking for a position in a large company or prefer the fast pace of a startup. Also, think about whether you want to be the only data engineer on deck or if you prefer working with a team of people who can provide support, inspiration, and drive for massive data projects. This can help you narrow down your field and target only the types of organizations with your dream position.