Think back to your high school biology class: did you enjoy it? Were there things about it that stressed you out? Speaking for myself, as a high school student in Wyoming, we had a field-based biology class where we would collect samples from various ecosystems around our town and bring them back to our lab to process and write reports on our findings. I absolutely loved every bit of that experience, but I found that I struggled with the quantitative side of the class. Any time math or statistics were required in our reports, I shut down immediately. While the instructor was great, I didn't feel supported in learning these quantitative data skills and that impacted my eventual change of major at the University of Wyoming from Wildlife and Fisheries Biology to a less-quantitative study area. While I am very happy with the path I've chosen, a part of me always wonders what my life would be like today if I had learned biology, and the quantitative skills needed for biological studies, in a more supportive environment!
It was serendipitous, then, for me to learn about the Pre-College Program in Computational Biology developed and offered here at CMU by Dr. Phillip Compeau and Dr. Joshua Kangas of the Computational Biology Department in the School of Computer Science. This program truly captures what a positive high school biology experience should look like, providing opportunities for high school students to gain experience in modern research labs with sophisticated hands-on experiments and computational biology methods. Compeau and Kangas have a passion for exposing students to real research in computational biology, giving students a sense of agency in their work by designing the lab exercises so students are able to make choices in the design of experiments - and show the effect of those design choices they made. This greatly differs from most high school biology labs in which experiments are based on a protocol, and if the steps are followed correctly, the experiment will be successful. Those protocols don't always capture what real research in a lab looks like! Research can be messy and serendipitous, and this program opens up students to that beautiful process.
This program began in July 2019 and has since run two successful cohorts through the curriculum, one in 2019 and one in 2020. The 2019 program gave students the opportunity to analyze data they collected in the field, which is truly a transformative research experience to go through! On day one, the program began with some bootcamp exercises and activities to get students up to speed on programming and working in a wet lab, and the second day sent students on a boat! This was a day-long trip among Pittsburgh's three rivers (Allegheny, Monongahela, Ohio) to collect data through water collection samples for the goal of further understanding the bacterial composition of these rivers. The DNA within these river water samples was sequenced, and students spent the remainder of the program learning how to engage with the sequencing data using computational biology methods (you can see the entire 2019 program syllabus here!
So what about 2020? Compeau and Kangas were creative in the face of this year's major challenges and ran an all-online program! They had to completely rethink the process and decided to create an intensive data analysis experience for the students. This involved sending the students some preparatory materials on programming in order to get everyone up to speed, and then running a hackathon which involved test cases and coding challenges (such as building evolutionary algorithms), leading to an enormously rewarding learning experience for students! Following the 2019 program, this summer's program continued to give students the opportunity to be engaged at the forefront of innovative topics in computational biology, including assembling the coronavirus genome and mapping an evolutionary tree of the coronavirus. While participating in hackathon-style experiences remotely presents its own set of challenges, Compeau and Kangas facilitated active learning practices in small groups using collaborative tools such as the teletype package for Atom (allowing for collaborative coding), Zoom, and Discord for community building among the students.
Many aspects of the program intersect with the data literacy skills we teach at CMU Libraries, including the importance of quality data collection and documenting your code! In the 2019 program, some of the samples collected on the boat didn't always pass quality controls, which reiterated to students the importance of collecting good data in the field. Further, in the 2020 program, many students realized the importance of writing code with the expectation that someone else will read it/run it, which includes commenting your code and cleanly structuring it. As we say in our data management workshops: documenting your choices made while coding is both self-care and community-care, keeping your research process more organized for yourself as well as for those who will look at and use your code in the future!
What's next for this program? Dr. Compeau and Dr. Kangas have many goals to grow and sustain this important program, including continuing to recruit students particularly from underrepresented groups and from the Pittsburgh area. They also hope to keep connections to students after the program in order to build a community around these computational biology learning experiences. Currently, Compeau and Kangas stay in touch with the students and are even working on publications with many of them, providing excellent exposure to the world of scientific publishing. Are you interested in learning more, or following this program as it progresses? Check out the program summary here! Also, CMU Libraries' very own Dr. Huajin Wang is our liaison librarian to the Computational Biology Department, and can help you find teaching, research, and learning resources related to this subject area! She can be contacted here.