I'm the founder of the Insight Data Science Fellows Program, and the new Insight Data Engineering Fellows Program we just launched above. With the Data Science program, which helps PhDs transition to industry, we're at 70+ alumni working as data scientists at companies like Facebook, Square, LinkedIn, Airbnb, etc.
This new Data Engineering Program is NOT restricted to PhDs, and open to all professional engineers or BS/MS graduates. It's still free, just like the Data Science Program, and is designed for people who want to leverage their existing software engineering skills to transition to a career in data.
Do you teach people about "hygiene", e.g. data provenance, versioning, and how to design schemas? I work on "big data" stuff at one of these major companies, and IMO the state of things is pretty sad. A typical pipeline involves a bunch of files strewn about a distributed file system, or a pretty messy database, especially when multiple teams are involved.
The tools (I use) don't encourage good practices or have good defaults. You have to put in extra effort and write proper metadata, etc.
I think things are just new so this kind of issue doesn't get much attention yet. Curious to see if anyone has written anything about it. I guess academics and government and people who have to keep data around for a long period of time will have thought more about this.
The entire program is based entirely around professional data engineers from the mentor companies coming in to share their best practices with the group, which the Fellows than work to implement in their projects. A number of mentors have told me they will focus on the topics you mentioned. That said I would love to get your take on this too. Would love it if you drop me a line at jake@insightdataengineering.com with any suggestions. Thanks!
Wow this is really awesome and I'd like to participate so I'll ask a couple questions.
I'm in my last semester of university for CS (starting graduate school in the Fall), what kinds of 'pre-reqs' would you suggest in terms of languages/programming paradigms/statistics knowledge?
What is the weekly number of hours we should be prepared to commit?
If you haven't already, I would recommend taking an intro to databases course. A machine learning course would also be helpful, if you have time to take it before you finish. Other than courses, I would try to build some weekend projects that demonstrate your ability to write clean, modular code.
Regarding hours: Insight is really intense, so while the official hours during the six week program are M-F 10am-6pm, most Fellows stick around pretty late each evening. The peer-to-peer learning aspect of the program is one of its biggest strengths, and you get the most out of that when you can be around the office collaborating with others as much as possible.
I see that the application deadline is April 14 for a program starting June 2. When do the accepted candidates get notified? This would be pretty important for an applicant outside of the bay area.
For anyone who applies before the end of this weekend, we'll be in touch by mid-week next week with decisions on next steps in the process. Final decisions should be made about 1.5-2 weeks after that. We'll move equally quickly for applications that come in next week through to the April 14 deadline.
I'm seriously thinking about applying (actually to the Data Science program). I'm currently looking to start in the SF Bay Area as a data scientist (or data analyst, if need be) in late May to June.
But I have a question -- I've only advanced to the masters level at this point; I recently graduated with an MS in biostatistics. "PhD" and "postdoc" is written all over the site. Should I even consider applying?
Finally -- what's the best way to contact you? Should I email the email address under "Contact"? Or is there a preferred alternative?
The Insight Data Science Fellows Program is currently for PhDs only. If you have enough engineering experience, I would suggest applying to the Data Engineering program, which is open to anyone. If not, then drop me a line at jake@insightdatascience.com and I can see what I can do to help.
Thanks for the response; I'm definitely applying to the Data Engineers program. My engineering background has been more on-the-job than from formal coursework, so I was a bit subdued by list of engineering disciplines in the "Accepting Applicants From" section (though I now see "Scientific Research" as one of the fields, yay!). I really hope I can participate.
Great to hear. We love skills learned on-the-job. That list is just meant to cast a wide net, so people feel welcome to apply from various backgrounds. The main take-away is that we want people who have the right fundamental skill set, and are not too concerned about which formal discipline they learned it under.
This new Data Engineering Program is NOT restricted to PhDs, and open to all professional engineers or BS/MS graduates. It's still free, just like the Data Science Program, and is designed for people who want to leverage their existing software engineering skills to transition to a career in data.
Happy to answer any questions here.