The path from Data Engineer to Data Science Engineer may not be very easy. But, with constant effort and dedication one can be a Data Science Engineer.
When I started the path towards this new area there were hundreds of questions that made me anxious of the decision I made. But there are lot of free and paid tutorials online which can help ease the initial pressure of learning the jargons of Data Science world.
My interest began when Andrew Ng started Deep Learning specialization and I wanted to give it a try. After I finished my specialization in Deep Learning, I wanted to dig deeper into AI/Machine learning world. Certification or specialization is of no help until one use it build something from scratch. But its a great way to learn the high level constructs and getting familier with terminology. It helped me to know the words to search in web when am researching about some specific use case.
Sometimes Python can help understand concepts on standalone machine. Thus helping to isolate core concepts versus distributed computing challenges. Spark has greatly abstracted these challenges.
Data science engineering also involves understanding of the underlying platform where the algorithms are supposed to run. Such as Hadoop/Spark or Cloud computing frameworks. Knowing the strength and weakness of the ecosystem is a must before designing any pipeline.
I started with courses available in Coursera first. Here is the list of resources I used and I'll keep adding more as and when I find great videos or online materials. Please feel free to add anything you may have found useful in the comment below:
YouTube - DataBricks/Spark Summit:
If you have challenges installing tools in you PC or Mac, DataBricks provide community edition portal for free. You can sign up for that and try. Youtube videos above shows how easy it is to try the Spark APIs using Interactive DataBricks Notebook.