Hello All, My younger brother wants to build a career as a data engineer and I have not much knowledge about this field. Can anyone help me out with how to become a data engineer and whats the necessary skills to have and their salary, job responsibilities?
Hi @kumararjun678904
How old is your younger brother? He should study computer science. Many university have Data Science course, he should complete Masters on that degree. Availability and costs are all depends on country and university.
He can be a self taught data scientist by studying the topics, but a degree will certainly help him landing on a job.
Well, Job responsibility defers with profession. But usually a Data Scientist work on a large set of data and by analyzing, computing and processing they interpret the results to create actionable plans. Such as weather prediction, space program, Astronomy etc.
Remember the Picture of Black Hole? It doesn’t captured with a camera, With an array of 8 giant Radio Telescope known as Event Horizon Telescope, It took more than a decade. Many data scientists worked on that project.
Now, coming back to your question, what’s the necessary skills to have? Lets check this roadmap below.
Roadmap:
- Become proficient at programming (Python and Scala)
- Learn automation and scripting (CRON,SHELL scripting)
- Learn Database (SQL)
- Data processing techniques (Kafka, Spark)
- Workflow scheduling
- Cloud computing (AWS, Azure)
- Infrastructure (Docker and Kubernete)
Note: I’m not a Data Scientist, All the steps are described in details on this website: The Path to Becoming a Data Engineer - Datacamp
He is 22 years old and I have checked on google when I was searching for data engineer responsibility.
Gathering a large set of data that aligns with business requirements.
Designing and developing a new system/infrastructure that makes the extraction, transfer, load process efficient, faster, and secure.
Use latest technologies like Cloud computing, machine learning, etc or products to keep the data systems and data pipelines up to date.
Identifying errors in the current system and configuring a new, effective solution.
Apply statistical methodologies, algorithms, and data structures for data procurement and analysis.
Administer the process of procurement, designing, test run, storage, and management of new and current data pipelines.
Working with interested parties like stakeholders for any data-oriented queries and issues.
Operate on the internal systems for a better functioning process like automating manual processes, or faster data delivery, or scalable business operations.
If you are in India and looking for Institutional education. There are many options:
https://www.greatlearning.in/iiit-hyderabad-seds
The list is not exhaustive.