Master of Science in Data Science


The Master of Science in Data Science will offer students an in depth education experience to focus on data science as it pertains to their unique interests. The foundation of this program has been built from the 4-course Certification of Professional Achievement in Data Sciences program. Students will be interacting with diverse faculty members and students, given the opportunity to conduct research opportunities, included in a capstone project course, and available for industry interaction.

Students will be given the opportunity to select an elective track which incorporates the six centers within the Institute as well as an Entrepreneurship track. This allows students to hone in on their particular interests and skill sets.

We currently accept applications for the fall term only. Applications for Fall 2015 admission will open in late September:  [Apply Here]

Application deadlines for our programs: February 15th. To learn more about the admissions application requirements, click here.

If you would like to learn more, or if you still have questions about the admissions application process, or the academic opportunities through the Data Sciences Institute, please sign up for one of our regularly scheduled online information sessions or refer to our Frequently Asked Questions.

Our curriculum is 30 credits total.

      1. Prerequisites: MATH V1101 and V1102 or the equivalent. A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes' rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov's inequality.

      1. Methods for organizing data, e.g. hashing, trees, queues, lists, priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

      1. Course covers fundamentals of statistical inference and testing, and gives an introduction to statistical modeling. The first half of the course will be focused on inference and testing, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course will provide introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression, and statistical computing. Throughout the course, real-data examples will be used in lecture discussion and homework problems.

      1. An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers.

      1. An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems.

      1. This class introduces the data processing and algorithmic skills, as well as design principles necessary to explore and present datasets computationally and visually. These include command line tools, the use of state-of-the art languages and software, an algorithmic understanding of how to work with a large datasets (including parallelism and the map-reduce framework), interactive visualizations, exploratory data analysis as a means to generate and test hypotheses, as well as basics of data exploration and visualization.

      1. This course provides a unique opportunity for students in the MS in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data science problems in industry, government and the non-profit sector. The course activities focus on a semester-length data science project sponsored by a local organization. The project synthesizes the statistical, computational, engineering challenges and social issues involved in solving complex real-world problems.

      1. The elective courses for the proposed M.S. in Data Science will draw upon existing graduate level courses at Columbia University. In addition to advisor approval, elective course selection will be subject to course pre-requisites, course availability, and the cross-registration procedures of the school/department offering the requested courses.

The M.S. program may be completed in two semesters of full-time intensive study or on a part-time basis.

500 W. 120th St., Mudd Room 524, New York, NY 10027    212-854-5660               
©2014 Columbia University