In 2014, the University of California, Berkeley School of Information launched its Master of Information and Data Science (MIDS) program. The multidisciplinary curriculum was designed to be completed in 20 months via live online classes ranging from Applied Machine Learning to Field Experiments.
Why did your I School develop the MIDS program?
The Bay Area technology community, our alumni and members of our advisory board were all saying data science is the future and you should consider it. On our end, we felt “data science” had not yet been clearly defined, and we wanted to play a leadership role in defining it.
We are educating people who can work with data, starting with asking good questions and understanding research design. They need to be able to find and pull together data in a form that makes it usable, store it in databases, and then apply either statistical tools or some of the new computational tools to analyze it. They will have to clearly communicate the story that the data tells to nonspecialists using words or visualizations. And they should be aware of the privacy and ethical issues associated with using data.
We have explicitly made our degree program much broader than statistics or computer science. Much like an I School, it is a mix of the social and the technical.
We launched this degree in January 2014 after a three-year gestation. Recently our chancellor came to us and said, “We think all freshmen should have exposure to data and data literacy.” This confirmed our insight that this was going to be important.
How do you differ between a data scientist and a data librarian?
A data librarian has a special set of responsibilities around stewardship and curation that other data scientists do not. These responsibilities include defining standards, storing data, ensuring data stays in a usable format, and organizing data in a way that makes it more accessible. And it may be a bit of an uphill battle.
UC Berkeley recently got a grant to set up the Berkeley Institute for Data Science. They wanted to create an infrastructure on campus to support data science in the sciences, and we put in a proposal to work on data curation, stewardship and policy issues around privacy and ethics. While we believe these are critical issues, they still have not become part of the mainstream discussion among the scientists on campus.
Tell me about your students and where they will end up.
We have been flooded with applications and currently accept less than one third of the students. Ninety-eight percent of our 126 students are working, and they represent a wide range of industries. The class tends to skew toward male students (80 percent); we think that might have something to do with the technical skills requirement, as we expect students to come in with quantitative competence and some programming skills.
We envision the field of data science as a bell curve. At one end are the PhD statisticians, computer scientists and physicists who are on the leading edge creating new algorithms. LinkedIn, Google and other technology companies are hiring these types of PhD data scientists or computer science graduates. At the other end are data analysts doing the simpler tasks. We are preparing our students for the big bell in the middle, which might include healthcare, finance and any number of other industries. Many of our students are not looking to change careers, but rather to rise within their existing organizations.
How important is the location of this degree (online and offered by a Bay Area powerhouse)?
We benefit from being in the Bay Area because a lot of the new data science tools and technology are developed here — there’s an ecosystem of data science. And we benefit from the Berkeley brand.
Physically, the School of Information is bursting at the seams, so we decided to make it an online degree. That makes sense as well for most of our students, who wish to rise within their current companies. There is also a natural tie-in to the curriculum: using technology to attain a technologically oriented degree.
Is this interest in data science a flash in the pan, or is it here to stay?
There is a great deal of unwarranted hype out there about “big data” and “machine learning.” That said, I believe every organization — from the library to the public sector, from multinational corporations to mom-and-pop shops — is going to be transformed by the ability to use data in interesting and new ways. I’m in the camp that believes this is a big change.