The first Michigan State University Cyberinfrastructure (CI) Days event was held on October 25-26, 2012.
CI Days presented an opportunity for faculty and students to understand the benefits that cyberinfrastructure can bring their scholarly pursuits, to see what others are doing with cyberinfrastructure, and to learn what resources are available on campus, across institutions, and nationally.
More importantly, this was an opportunity for the MSU community to share information about cyberinfrastructure resources and their use in research. The event featured:
- Nationally renowned research leaders from multiple disciplines of study addressing how advanced technologies enable their scholarly work;
- A panel of National Science Foundation directors discussing the future of CI-enabled discovery and learning;
- Posters showcasing CI-enabled research at MSU; and
- Resource fair featuring CI resources available to MSU researchers.
All events were held on campus in the Biomedical and Physical Sciences building. The conference began on Thursday, October 25 at 5 p.m. with a reception and poster session. A day of workshops, presentations, and a resource fair followed on Friday, October 26.
What is CI?
The National Science Foundation defines CI as a collection of advanced technologies and services to support scientific inquiry. This includes:
- Computing clusters and high performance computing systems;
- Data management, data integration, and data storage systems;
- High speed networks;
- Data mining and data visualization;
- Collaboration and security tools; and
- The people who design, build, and run these systems.
CI Days was coordinated by the Institute for Cyber Enabled Research (iCER) and IT Services Research Support. CI Days was jointly sponsored by the Vice President for Research and Graduate Studies and the Vice Provost for Libraries and Information Technology Services. CI Days is based on an initiative originally supported by the National Science Foundation in 2010 to support the implementation and use of cyberinfrastructure at research institutions.
Keynote Addresses
Some Assembly Required: Understanding and Enabling Collaboration in 21st Century Teams
Recent advances in Web science provide comprehensive digital traces of social actions, interactions, and transactions. These data provide an unprecedented exploratorium to model the socio-technical motivations for creating, maintaining, dissolving, and reconstituting into teams - for research, business, or social causes. Using examples from research in team science and massively multiplayer online games, Contractor will argue that network science serves as the foundation for the development of social network theories and methods to help advance our ability to understand the emergence of effective teams. More importantly, he will argue that these insights will also enable effective teams by building a new generation of recommender systems that leverage our research insights on the socio-technical motivations for creating ties.
Dr. Noshir Contractor is the Jane S. & William J. White Professor of Behavioral Sciences in the McCormick School of Engineering & Applied Science, the School of Communication, and the Kellogg School of Management at Northwestern University. He is the Director of the Science of Networks in Communities (SONIC) Research Group at Northwestern University. He is investigating factors that lead to the formation, maintenance, and dissolution of dynamically linked social and knowledge networks in a wide variety of contexts including communities of practice in business, translational science and engineering communities, public health networks, and virtual worlds. His research program has been funded continuously for over a decade by major grants from the U.S. National Science Foundation with additional current funding from the U.S. National Institute of Health (NIH), Air Force Office of Research Support, Army Research Institute, Army Research Laboratory, and the MacArthur Foundation.
Dr. Contractor has published or presented over 250 research papers dealing with communicating and organizing. His book titled Theories of Communication Networks (co-authored with Professor Peter Monge and published by the Oxford University Press) received the 2003 Book of the Year award from the Organizational Communication Division of the National Communication Association. He is the lead developer of Cyberinfrastructure for Inquiring Knowledge Networks on the Web (C-IKNOW), a socio-technical environment to understand and enable networks among communities, as well as Blanche, a software environment to stimulate the dynamics of social networks.
Openness, Reproducibility, Interactivity: A Biased View on the Relation Between Science and Computing
Computers have become so central to the scientific enterprise that today hardly any new result can be found that does not critically depend on computing. And yet there are serious issues in our computational praxis: abysmal standards of rigor lead to work that even its authors can't reproduce by the time papers reach the printing press; using parallel resources is typically difficult and inefficient; we have "black box science" where key steps are locked inside proprietary products; etc.
Trained like most scientists (i.e., computing is something you learn "as you go"), Dr. Perez has spent the last decade working with one leg in research problems in applied mathematics and the other in open source software development. In this talk, he will try to bridge those two worlds and introduce some of the lessons he has learned in open source work, focusing on the scientific ecosystem that has developed around the Python programming language. Through his view as lead of the IPython project, he will show how this ecosystem allows us to make progress on the above issues: we now have open source tools that can sustain the life-cycle of a scientific idea; from individual exploration, through collaboration, large-scale production work, publication, and education.
Dr. Fernando Perez received his Ph.D. in Physics from the University of Colorado-Boulder (CU), working on questions regarding the topological structure of the QCD vacuum using Lattice Gauge Theory techniques. He continued his postdoctoral research at the CU Applied Mathematics Department, developing algorithms for the application of linear operators in multiple dimensions, for the fast and accurate solution of partial differential equation. Since 2008 he has worked as a ressearch scientist at the Helen Wills Neuroscience Institute at the University of California-Berkely. His current research focuses on mathematical questions and the development of computational tools for neuroimaging.
As computing has been a constant thread of his research, he has become actively involved in the development of tools for better scientific computing using high-level languages, in particular Python. He is the creator and lead of theIPython project for interactive and high-level parallel computing, as well as contributor to a number of major scientific Python projects. He regularly teaches workshops on modern practices for scientific computing.
When not glued to a computer, Dr. Perez tries to spend as much time as possible with his wife outdoors hiking and backpacking, as well as climbing. For more information, seefperez.org.
Morning Breakout Sessions
Better Data Analysis for Biologists: Help Build the Factory, Not Just the Tools
A common pain point for biologists is computationally intensive data analysis, which is usually done by individual scientists or by local, ad-hoc lab experts. Cyberinfrastructure makes it possible to do data analyses that are reproducible, efficient, best-practice, and that leverage compute resources.
Increased crop yield in changing, suboptimal environments is key for future food security. To better understand and predict plant growth and yield, plant biologists study gene-environment interaction patterns - the genotype-phenotype prediction problem. Field biologists like Dr. Stapleton and her students typically have more expertise in phenotyping and experimental design than in efficient, scalable data analysis.
Biologists learn about new and better data analyses through a variety of channels, from bioinformaticist Twitter feeds to published papers that are many years out of date. This leads to much wasted effort, as new analyses are often done late, often in response to peer reviews of manuscripts.
Let's change the scale of how we do data analysis in biology! In plant biology, we have collectively contributed to building the cyberinfrastructure "factory" that powers individual data analysis tools. Dr. Stapleton will talk about lessons learned and future developments in biology cyberinfrastructure.
Dr. Ann E. Stapleton is an Associate Professor at the University of North Carolina-Wilmington (UNCW). Her primary research focus is on the genetics of abiotic and multiple stress responses in plants. She has a substantial program in interdisciplinary computational biology tool and infrastructure development, in collaboration with computational scientists at UNCW and a national consortium led by the University of Arizona and Cold Spring Harbor Laboratory.
nanoHub.org Powered by HUBzero® - A Platform for Collaborative Research and Dissemination with Quantifiable Impact on Research and Education
nanoHug.org served a community of 181,000 users in the past 12 months with an ever-growing collection of 2,700 resources, including over 230 simulation tools. nanoHUB.org is driving significant knowledge transfer among researchers and speeding transfer from research to education. The open-source HUBzero software platform, built for nanoHUB, is now powering many other hubs. This presentation will provide an overview of nanoHUB processes and impact, and provide a vision for a Federation of HUBs for the advancement of science and engineering.
Dr. Krishna P.C. Madhavan is an Assistant Professor of Engineering Education at Purdue University. He is also the Education Director for the US NSF-funded Network for Computational Nanotechnology. His work focuses on large-scale data analysis and interactive visualization for personalizing learning and understanding academic impact. He also focuses on the design of instrumentation for large-scale engineering and science cyber-environments. He won a CAREER award for work on personalizing learning with engineering cyber-environments. Dr. Madhavan was the Chair of the IEEE/ACM Supercomputing Education Program 2006 and was the curriculum director for the Supercomputing Education Program 2005.
Afternoon Breakout Sessions
From Images to Data: Scaling and Streamlining Research Workflows
Historically, experimental observations in science have been limited to handwritten logs and field diaries. However, the recent influx of low cost digital cameras allows researchers who rely on visual observations to digitally record experiments. This has increased the amount of data available to researchers and allows researchers to time-shift their observational processes, so that multiple cameras can continually record experiments and researchers can review and re-review the data at their leisure. Such low-cost cameras can gather a tremendous amount of digital data, but there is no simple, automated method for examining this data and extracting the information necessary for scientific measurements. Thus, researchers typically use man-hours (i.e., hire students) to manually annotate video data frame-by-frame, which is an extremely slow process subject to variations in quality and detail.
The problem is that the amount of data produced by existing digital cameras is many orders of magnitude larger than the scientific observations needed by the researchers. Our research team is developing new methodologies to facilitate the scientific process and make it more affordable to filter large amounts of image data into observations that can be used to test research hypotheses. This talk will present work being done by MSU researchers to improve scientific workflow so that researchers can scale-up faster and minimize their mean time to science.
Dr. Dirk Colbry joined Michigan State University in 2009 as a research specialist within the Institute for Cyber Enabled Research (iCER). At iCER, Dr. Colbry helps the MSU community utilize Computational Infrastructure in research, through classroom instruction, one-on-one consulting, and research collaboration.
An alumus of MSU, Dr. Colbry has a Ph.D. in Computer Science and his principle areas of research include machine vision and pattern recognition (specializing in scientific imaging) and high performance computing. Dr. Colbry collaborates with scientists from multiple disciplines, including Engineering, Zoology, Mathematics, Statistics, and Biology. Recent projects include research in Image Phenomics; developing a commercially-viable 3D face verification system; adapting pattern recognition processes for tire engineering; and exploring uses of face processing to help individuals who are blind in social interactions. Dr. Colbry has taught a range of courses in computer science, including microprocessors, artificial intelligence, compilers, as well as courses in programming and algorithm analysis.
Informatics Support for Clinical Research: Study and Data Management
This presentation will have two components. Dr. Philip Reed will provide an overview of the research informatics and data management capabilities at the Biomedical Research Informatics Core (BRIC). He will focus on solutions to research informatics challenges developed for BRIC projects in Malawi, Nigeria, and the United States. Dr. Teeradache Viangteeravat will describe the process he led for BRIC to achieve certification as a Federal Data Center for the Centers for Disease Control and Prevention. This will cover elements such as the System Security Plan, HHS Privacy Impact Assessment, Business Continuity Plan, and vulnerability assessments that are required to meet standards established by the Federal Information Security Management Act (FISMA).
Dr. Philip Reed, Ph.D. (Psychology) MS (Epidemiology) is Director of the MSU Biomedical Research Informatics Core and Interim Director of the Clinical and Translational Sciences Institute. The two organizations are central components of MSU infrastructure supporting clinical and translational research. In his role as Director of these two units, Dr. Reed oversees the activities of 40 individuals engaged in clinical trial development and support, grant development consulting regarding data management for clinical and translational investigators across several MSU colleges, and the operation of data coordinating centers for research programs at numerous locations in the U.S., Africa, Europe, Mexico, and Jamaica. Dr. Reed personaly studies the impact of trauma on the mental health of first responders and is Principal Investigator of the Data Coordinating Center for the Centers for Disease Control and Prevention national study of the epidemiology of autism.
Jonathan Babbage, MS (Computer Science and Engineering) is Unit Information Systems Manager of the Biomedical Research Informatics Core. Mr. Babbage oversees the development, administration, and integration of a variety of services offered to researchers through the informatics core. He is also responsible for the deployment of mobile offline data collection devices in Malawi that enables field workers to scan barcodes, take photos, and collect GPS coordinates. These data are synchronized into a remotely administered Clinical Research Management System. Mr. Babbage works to extend this type of informatics service to a variety of researchers on campus. He integrates informatics problems into his ongoing Computer Science education through Ph.D. research focused on classification/clustering and model generation.
National Science Foundation Panelists
Dr. Joan Ferrini-Mundy is an Assistant Director of the National Science Foundation (NSF) for Education and Human Resources, a position she has held since February 2011, and is responsible for the leadership of the NSF Directorate for Education and Human Resources (EHR). She had served the Foundation in a number of capacities since 2007 including as inaugural director (through an Intergovernmental Personnel Act appointment) of the EHR Directorate's Division of Research on Learning in Formal and Informal Settings.
From 2007 through 2009, Dr. Ferrini-Mundy was a member of the National Science and Technology Council's (NSTC) Subcommittee on Education and currently co-chairs the Strategic Plan workgroup of the National Science and Technology Council Committee on STEM Education. She is a member of the Mathematics Expert Group of the Programme for International Student Assessment (PISA), and in 2007-2008, representing the NSF, she served as an ex officio member of the President's National Mathematics Advisory Panel, and co-chaired its Instructional Practices Task Group. From 1999-2011 Dr. Ferrini-Mundy held an appointment at Michigan State University (MSU) where she was a University Distinguished Professor of Mathematics Education in the Departments of Mathematics and Teacher Education, and Associate Dean for Science and Mathematics Education in the College of Natural Science. Her research interests include calculus teaching and learning, mathematics teacher learning, and mathematics and science education policy at the K-12 level. Dr. Ferrini-Mundy holds a Ph.D. in mathematics education from the University of New Hampshire. She was elected a fellow of the American Association for the Advancement of Science in 2011.
Dr. Myron P. Gutmann is Assistant Director of the NSF, where he leads the NSF's Social, Behavioral, and Economic (SBE) Sciences Directorate. The SBE Directorate is responsble for the NSF's research about people and their lives, with broad interdisciplinary connections to research throughout the foundation. He is also Professor of History and Information and a Research Professor in the Institute for Social Research at the University of Michigan. Prior to joining the NSF, he was Director of the Inter-university Consortium for Political and Social Research (ICPSR). Dr. Gutmann has broad interests in interdisciplinary historical research, especially health, population, economy, and the environment. Since 1995 he has led a multi-site research program about population, agriculture, and environmental change in the U.S. Great Plains, which has produced important research results that show how demographic and agricultural change both respond to environmental conditions and shape environmental outcomes such as greenhouse gas production. As Director of ICPSR, he was a leader in the archiving and dissemination of electronic research materials related to society, population, and health, with a special interest in the protection of respondent confidentiality. He has written or edited five books and more than eighty articles and chapters. Dr. Gutmann has served on a number of national and international advisory committees and editorial boards.
Dr. Farnam Jahanian serves as the NSF Assistant Director for the Computer and Information Science and Engineering (CISE) Directorate. He guides CISE in its mission to uphold the nation's leadership in computer and information science and engineering through its support for fundamental and transformative advances that are a key driver of economic competitiveness and crucial to achieving national priorities. Dr. Jahanian oversees the CISE budget of over $600 million, directing programs and initiatives that support ambitious long-term research and innovation, foster broad interdisciplinary collaborations, and contribute to the development of a computing and information technology workforce with skills essential to success in the increasingly competitive, global market. He also serves as co-chair of the Networking and Information Technology Research and Development (NITRD) Subcommittee of the National Science and Technology Council Committee on Technology, providing overall coordination for the activies of 14 government agencies.
Dr. Jahanian holds the Edward S. Davidson Collegiate Professorship in Electrical Engineering and Computer Science at the University of Michigan, where he served as Department Chair for Computer Science and Engineering from 2007-2011 and as Director of the Software Systems Laboratory from 1997-2000. Earlier in his career, he held research and management positions at the IBM T.J. Watson Research Center.
Over the last two decades at the University of Michigan, Dr. Jahanian led several large-scale research projects that studied the growth and scalability of the Internet infrastructure, which ultimately transformed how cyber threats are addressed by Internet Service Providers. His research on Internet infrastructure security formed the basis for the successful Internet security services company Arbor Networks, which he co-founded in 2001. Dr. Jahanian served as Chairman of Arbor Networks until its acquisition by Tektronix Communications, a division of Danaher Corporation, in 2010.
Dr. Jahanian is the author of over 100 published resaerch papers and has served on dozens of national advisory boards and panels. His work on Internet routing stability and convergence has been highly influential within both the network research and the Internet operational communities and was recognized with an ACM SIGCOMM Test of Time Award in 2008. He has received numerous other awards for his innovative research, commitment to education, and technology commercialization activities. He was named Distinguished University Innovator at the University of Michigan (2009) and received the Governor's University Award for Commercialization Excellence (2005).
Dr. Jahanian holds a Master's degree and a Ph.D. in Computer Science from the University of Texas-Austin. He is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronic Engineers (IEEE), and the American Association for the Advancement of Science (AAAS).