Skip Navigation
Faculty | Courses | Giving

Programming and Data Science for Biologists (PDSB) will introduce students to fundamental computational skills and concepts for working with large biological data sets. This will include an introduction to several programming languages (Python, R, Julia), and in-depth training in one language in particular (Python). We will cover tools for collaboration and version control (git, GitHub), and how these tools can be used to host and share code, data, and websites. A core focus throughout the course will be reproducibility and learning tools (jupyter) and practices for this purpose. We will learn to organize and structure data for statistical analyses (DataFrames, arrays, datatypes), and explore tools for scientific analyses (scipy, pymc3, scikit-learn, keras) and visualization (matplotlib, toyplot, bokeh). Exercises and assignments will introduce students to large empirical datasets used in the biological sciences, from studies of genomics to biodiversity. The latter half of the class is organized around individual projects, in which students will be guided to design a command-line program and/or API for performing a specific type of analysis. Computer programs are ubiquitous in biology, but few biologists receive formal training in designing and writing software. This course offers a deeper introduction to computational techniques and algorithms commonly applied to biological datasets.

Return to Top