VSCSE Data Intensive Summer School, June 30-July 2, 2014
Where: Lubar S250, University of Wisconsin-Milwaukee
When: June 30-July 2, 10:00 a.m. to 4:00 p.m.
If you registered, but are unable to attend, please let us know so we can allow someone else to take advantage of this opportunity.
Prepare your laptop computer in advance
In the best interest of time, our instructors requested that everyone review the following links, and download and/or install content before arriving to class on June 30:
- R Studio (statistical programming language): Follow “download RStudio Desktop” http://www.rstudio.com/ide/download
- WEKA (data mining software). Follow “Download” link on left hand side of home page: http://www.cs.waikato.ac.nz/ml/weka. Download the Stable book 3rd ed. (NOTE: it isn't apparent from the organizers' information where you can download this book, but we did confirm that you do NOT need a book. If you find it online, it would be useful.).
- Prior knowledge of R is not required, but we do assume that you have some programming experience and familiarity with basic programming concepts (variables, arrays, loops, branching, etc.). You may find it helpful to acquaint yourself with basic R syntax ahead of time. Reading the first two chapters of the following online introduction is recommended http://cran.r-project.org/doc/manuals/R-intro.html
- A basic understanding of relational databases and SQL would be useful. If you are unfamiliar with the SQL syntax, please consider the following tutorials: http://sqlzoo.net and http://www.w3schools.com/sql/sql_intro.asp .
- KNIME: On the third day, we will explore KNIME, an easy-to-use, visual programming language that's popular in predictive analytics and text-mining communities. However, prior knowledge of KNIME is not required. http://www.knime.org/
Examples and assignments may involve the modification of short, well-documented blocks of code.
Monday, June 30
11:15-12:00: Workflows and data provenance. Illkay Altintas, director of the Scientific Workflow Automation Technologies Laboratory at the SDSC.
12:15-1:15: Workflows and data provenance, continued.
1:45-4:00: Workflows and data provenance, continued with optional break ~ 1:00 p.m.
Tuesday, July 1
10:00-11:00: File systems, hardware and the nuts and bolts of storage. Rick Wagner (SCSC).
11:00-12:00: Working with big data. Amarnath Gupta (SDSC) and Bill West (SDSC).
12:15-1:15: Working with big data, continued. Gupta and West.
1:45-4:00: Working with big data, continued with optional break ~ 1:00.Gupta and West.
Wednesday, July 2
10:00-11:00: Introduction to predictive analytics and data mining. Natasha Balac, director of SDSC's Predictive Analytics Center (PAC).
11:00-11:30: Overview of data mining tools. Nicole Wolter (SDSC-PAC).
11:45-1:15: Unsupervised learning (PCA and clustering). Paul Rodriguez (SDSC-PAC).
1:45-2:45: Supervised learning (decision trees). Wolter and Balac.
3:00-4:00: Techniques and strategies for big data. Rodriguez.