2014 CI Days | Office of Research and Innovation

The third annual Michigan State University Cyberinfrastructure (CI) Days event will be held on October 23-24, 2014.

CI Days presents an opportunity for faculty and students to understand the benefits that cyberinfrastructure can bring their scholarly pursuits, to see what others are doing with cyberinfrastructure, and to learn what resources are available on campus, across institutions, and nationally.

The two-day event features:

Special guest speaker presentations
Interactive workshops on a variety of topics
Poster session showcasing CI-enabled research at MSU
Resource fair featuring various CI resources available to MSU researchers
Networking opportunities

What is CI?

The National Science Foundation defines CI as a collection of advanced technologies and services to support scientific inquiry. This includes:

Computing clusters and high performance computing systems
Data management, data integration, and data storage systems
High speed networks
Data mining and data visualization
Collaboration and security tools
The people who design, build, and run these systems

CI Days is coordinated by the Institute for Cyber Enabled Research (iCER). CI Days is jointly sponsored by the Office of the Vice President for Research and Graduate Studies, IT Services, and iCER. CI Days is based on an initiative originally supported by the National Science Foundation in 2010 to support the implementation and use of cyberinfrastructure at research institutions.

Slides

Sessions

Thursday, October 23, 2014

Pre-Conference Morning Workshops

Introduction to Python

Python is a programming language that is now being used widely in research computing because:

It is relatively straight forward to learn and understand
Once learned is very useful. People who have learned Python can be useful in their discipline.
Is widely supported in many disciplines.

This tutorial will go over the basic concepts of Python, show some examples of how it is useful, and point folks to resources to do further research.

Please bring laptop.

Introduction to High Performance Computing at MSU

This workshop will teach participants how to get started with the High Performance Computing Center (HPCC) at MSU. Topics include connecting, working with files, navigating the command line, accessing available software, testing and running programs, writing scripts, and submitting and monitoring jobs.

Please bring laptop.

Top 10 Productivity Tools in MATLAB / Programming with MATLAB

Top 10 Productivity Tools in MATLAB
In this technical session, we present the unsanctioned but highly acclaimed list of "Top 10 Productivity Tools in MATLAB" – ways to increase your productivity and effectiveness as you use MATLAB to:

Explore, analyze, and visualize data
Develop, test, and maintain MATLAB algorithms and applications
Consolidate and share results with colleagues

Through product demonstrations, we share tips and best practices for:

Importing data from a variety of file formats
Creating and customizing plots
Analyzing, profiling, and debugging MATLAB code
Publishing custom reports
Discovering new MATLAB features

Please bring laptop.

Programming with MATLAB
MATLAB is a high-level language that includes mathematical functions for solving engineering and scientific problems. You can produce immediate results by interactively executing commands one at a time. However, MATLAB also provides features of traditional programming languages, including flow control, error handling, and object-oriented programming (OOP). Attend this seminar to learn more about programming capabilities in MATLAB and to learn how to be more productive working with MATLAB.

Topics covered will include:

Basics of the MATLAB programming language
Moving from scripts to functions
Building robust, maintainable functions
Tools for efficient program development
Using and authorizing objects in MATLAB

Those attending this seminar will be expected to either have a rudimentary knowledge of MATLAB or have knowledge of other programming languages.

Please bring laptop.

Pre-Conference Afternoon Workshops

Advanced Python

This workshop will build off the morning session of "Introduction to Python" and will go over more advanced concepts of Python.

Please bring laptop.

Advanced Topics in HPC: Making Your Research Go Faster

During this workshop, those already familiar with the HPCC systems will be shown advanced techniques on how to use the systems more effectively.

Please bring laptop.

Optimizing Your MATLAB Code

Part 1: Speeding Up MATLAB Applications
We will discuss and demonstrate simple ways to improve and optimize your code that can boost the execution speed of your application. We will also address common pitfalls in writing MATLAB code and explore the use of the MATLAB Profiler to find bottlenecks.

Highlights include:

Understand memory usage and vectorization in MATLAB
Address bottlenecks in your programs
Optimize file I/O to streamline your code

Part 2: Parallel and GPU Computing with MATLAB
In part two of this seminar, you will learn how to solve and accelerate computationally and data-intensive problems using multicore processors, CPUs, and your computer clusters. We will introduce you to high-level programming constructs that allow you to parallelize MATLAB applications and run them on multiple processors.

Highlights include:

Toolboxes with built-in support for parallel computing
Creating parallel applications to speed up independent tasks
Scaling up to computer clusters, grid environments or clouds
Employing CPUs to speed up your computations

Optional part 3: Generating C and C++ Code from MATLAB using MATLAB Coder
In this portion of the seminar, we will demonstrate the workflow for generating readable and portable C and C++ code from your MATLAB algorithms using MATLAB Coder. Using the command-line approach or the graphical project management tool, you can introduce implementation requirements to your algorithms written in MATLAB and then generate readable source code for standalone execution, integration with other software, accelerating MATLAB algorithms, or embedded implementation.

Please bring laptop.

Friday, October 24, 2014

Opening Keynote

Assembling DNA Puzzles From Millions of Pieces
Presented by Dr. Adam Phillippy

Thirteen years after the initial publications, and billions of dollars spent, there still remain unknown sequences in the human genome. This is due to a limitation of current DNA sequencing technologies, which can only read short DNA fragments at a time. Thus, the full genome must be computationally reconstructed from millions of these short fragments. Solving this problem is hard, in the computational sense, and touches on many foundational areas of computer science including string and graph algorithms, data structures, parallel computing, and databases. Although genome assembly programs are incredibly elegant, fully reconstructing a human genome, or even a small bacterium, has been simply impossible. Only recently has sequencing technology advanced to the point that these problems are now solvable. This presentation will give a brief history and overview of the latest developments.

Adam Phillippy is a Senior Principal Investigator at the National Biodefense Analysis and Countermeasures Center in Frederick, Maryland, where he leads a bioinformatics team responsible for both hardware and software development at the center. He received his PhD in Computer Science from the University of Maryland and has previously worked at The Institute for Genomic Research (TIGR) and the J. Craig Venter Institute (JVIC). For his first job at the TIGR, he developed software to analyze the genomes of anthrax bacteria collected by the FBI during the Amerithrax investigation. He continues to specialize in DNA sequencing and the development of efficient algorithms for the bioforensic investigation of infectious disease outbreaks.

Morning Breakout Sessions

Bioinformatics in the Corporate World: A New Set of Challenges

Presented by Dr. Vernon McIntosh

For most biologists, the cleft between the ability to generate data and effective utilization of that data is well understood. This is more severe in some ways, less in others, when conducting research in an industrial setting. Dr. McIntosh will discuss, from an industrial research perspective, the complexities of effectively utilizing today's high-throughput data generation technologies. From the initial experiment design and execution to supporting expertise, software and hardware infrastructure, bioinformatics in a for-profit industrial setting is a wholly different beast. This talk will offer a narrative outlining these considerations and hopefully spark a dialogue about what universities and other public research institutions can offer as support.

Vernon McIntosh joined Cargill Inc. as a Senior Molecular Biotechnologist in 2010. In this role, his primary responsibilities have been the metabolic engineering of industrial chemicals biocatalyst fermentation platforms. Most recently Vernon has taken on the task of developing Cargill's global genomics and bioinformatics platforms and building business for this platform both internally and with third parties. Vernon received his PhD from the University of Tennessee's Center for Environmental Biotechnology in 2010. During the completion of his dissertation, titled An Analysis of Global Gene Expression Resulting from Exposure to Energetic Materials, he developed an expertise in applying machine learning techniques, specifically Support Vector Machines to achieve biomarker identification and predictive modeling. Vernon and his wife Nichole currently live in Minneapolis, MN with their 3 dogs. Vernon and Nichole are active volunteers as well as financial supporters of the Science Museum of Minnesota.

Broad Data: Why Computing and Humanities Matter to One Another

Presented by Dr. Michael Simeone

In this talk, we'll start by looking at a few examples of successful projects in the digital humanities that demonstrate transformative contributions made by collaborators in computing disciplines. Focusing on the methodology rather than specific technologies, we'll propose a model for research that is less focused on digital tools and more interested in durable relationships among methodologically compatible collaborators. In the proposed model of Broad Data, digital research in the humanities is part of a wider effort in data-driven research that recognizes multiple kinds of contributions beyond the size of data and statistical calculations alone. We'll conclude with an example of what broad data could look like in research and community practice, where the use of technology resources not only helps individual research projects, but also helps steer future development and policy.

Michael Simeone is the Director of the Nexus Lab for Digital Humanities at Arizona State University. He is also affiliated with the Image and Spatial Data Analysis Division at the National Center for Supercomputing Applications. His research includes cultural studies of science and technology, humanities visualization methods, the use of computer vision in the digital humanities and the decision science systems that bridge engineering and humanities. He received his PhD in English from the University of Illinois at Urbana-Champaign.

Afternoon Breakout Sessions

Assemble ALL THE THINGS! How to Improve Metagenome Assembly Without Writing a New Assembler

Presented by Dr. Matthew Scholz

Over the last several years, sequencing technology has enabled sequencing of communities of bacteria at a rapidly increasing rate. The availability of tools to handle these assemblies has been growing (e.g. IDBA, SpaDES). To compliment these tools, we developed a post assembly merging protocol (MeGAMerge). The tool has applications for many potentially difficult to address metagenomes. It can also address new questions by allowing assembly of multiple samples simultaneously.

Matthew Scholz has a BA/BS from the University of Washington, Seattle, in Biochemistry and Drama. During that time, he also worked at the UW network operations center. He received his PhD from the University of Tennessee in 2012 in Microbiology. From 2010-2014 he worked at the Los Alamos National Laboratory as a bioinformatics scientist, working on genome assembly, metagenome assembly, pathogen detection and RNAseq. He is currently working at The Institute for Cyber-Enabled Research (iCER) at Michigan State University (MSU) as a research specialist, and is the Director of the newly formed Bioinformatics Center for Education and Productivity (BiCEP) At MSU.

The Place of Analogy in Collaborative, Interdisciplinary Computing, or, What Does Bioinformatics Have to Do With Urdu Poetry?

Presented by Dr. A. Sean Pue

How can programming humanists talk to programming scientists? This paper will draw on the author's experience of collaborating with computational biologists to write code to determine the meter of classical Urdu poetry. The paper will argue for the central role of analogy as a catalyst for collaborative, interdisciplinary computing. It will also consider how existing cyberinfrastructures, usually focused on the natural sciences and engineering, can be adapted to address humanities "data." Finally, it will suggest how the movement for "open" science can be viewed as analogous to that for "digital" humanities.

A. Sean Pue (@seanpue) is associate professor of Hindi Language and South Asian Literature and Culture in Michigan State University's College of Arts and Letters. He also teaches Global Studies in Arts and Humanities as well as Digital Humanities. His book, I Too Have Some Dreams: N. M. Rashed and Modernism in Urdu Poetry, was published recently by University of California Press. His next project will focus on the role of sound in poetry and involve computational analysis of texts and performances across a number of South Asian languages. He prefers Python to R and blogs at seanpue.com.

Closing Keynote

Merging Infrastructures: Department, College, University, and Beyond
Presented by Dr. Jennifer Guiliano

Merging Infrastructures: Department, College, University, and Beyond explores the ways in which digital research projects must navigate the often-overlapping, but frequently problematic, technical and social architectures to successfully complete their projects at the department, college, and university scales. Drawing examples from previous and ongoing digital research, this talk will explore how to successfully navigate coalition building along technical, social, and cultural axises. And, importantly, we'll explore how limitations in conceptualizing infrastructure as merely technical or the domain of one organizational group limits our potential as humanists and researchers.

Jennifer Guiliano is Assistant Professor in the Department of History at the Indiana University-Purdue University Indianapolis. She has served as a Post-Doctoral Research Assistant and Program Manager at the Institute for Computing in Humanities, Arts, and Social Sciences at the National Center for Supercomputing Applications (2008-2010) and as Associate Director of the Center for Digital Humanities (2010-2011) and Research Assistant Professor in the Department of History at the University of South Carolina. She most recently held a position as Assistant Director at the Maryland Institute for Technology in the Humanities at the University of Maryland where she also served as an adjunct instructor in the Department of History and the Digital Cultures program in the Honor's College. Dr. Guiliano currently serves on the Association for Computing in the Humanities (ACH) Executive Council (2013-2016), as co-director with Trevor Muñoz of the Humanities Intensive Teaching + Learning Initiative (HILT), and as co-author with Simon Appleford of DevDH.org, a resource for digital humanities project development.