Week 2: Data Preparation
Preparing Data: Finding raw, public or organizational data and defining what questions can be answered using the organization’s data. I will provide you with sets of data and specific analyses desired. You will need to analyze the data and decide how it needs to be “reshaped” in order to perform the desired analysis. We will then discuss these analyses on GitHub; I will provide similar examples and ask you to develop and prepare the data further. You can use any software tool you are familiar with, but I will be providing examples and help with R (original Syllabus said Python, but we didn't get to the Python Lecture week one :) ). Choose your own tools and you are “on your own” to some extent. ☺ … but tools are covered more fully in week 3. I do encourage you to go through some Python tutorials during week’s 1 & 2 --- http://docs.python.org/2/tutorial/ (we will be using Python 2.7.3 in this course; Python 3 is another discussion.)
1. Part One of “Data Analysis with Open Source Tools”.
2. Chapters 1-3 of “The Anarchist in the Library: How the Clash Between Freedom and Control is Hacking the Real World and Crashing the System”.
This is first and foremost an analysis assignment and an assignment focused on familiarizing yourself with what R can help you with. A full, working sample is provided on GitHub. You can click this link to download the Full Zip File. Then you will have access to the data under the “Week2” directory”
1. Set your working directory to “Week2”
2. Run “Complete.R”. Examine the comments and the resulting files to familiarize yourself with a Description of the data
Analysis Questions. Write up a short essay with tables or graphs if needed to describe how you would:
Build a network using the scripts from week1 against the mention connections? Reply-To connections? In this sample data. What transformations are required? How would you filter the data? Use the actual data to ground your thinking. Feel free to actually write or modify the R code samples from the first two weeks to experiment. Some of you will be more comfortable doing this; some will be more comfortable addressing the question conceptually. This is OK.
Submit any issues you encounter to GitHub under this repository You can also use the Blackboard Discussion area. I will check both. One of the advantages of GitHub issues for things related to the repository is that the context of your question is clearly preserved; where it disappears after the class if you post it to Blackboard. THIS IS A DIRECT LINK TO GITHUB ISSUES FOR OUR REPOSITORY
I will open a discussion board under our Blackboard Shell regarding the three papers you were assigned to read last week. I expect you to answer the questions and respond to your classmates. Your participation does not need to be long, just thoughtful. Here is the setup for the question that's under the "Discussion" Link in Blackboard Learn:
During week one you were assigned three readings. They are available here: http://seangoggins.net/DS-WeekOne
One of the readings focused on measurements of "success" in online dating. This paper is interesting, in a way, because instead of quantifying measures of success from Trace data, it looks at what people say about their experiences, and describes how different online sites have different ideas about what "success" is. The other two papers are longer, and complement each other. Howison et al focuses on the limitations of using electronic trace data to draw conclusions about social groups. Goggins et al focuses the discussion on *how* to ground analysis of trace data (logs) in social science - so that we are answering SOCIAL questions instead of merely TECHNICAL questions about participation.
Reflecting on your own use of social media, online forums, email and other electronic communications, discuss where you think Goggins and Howison make a point you can relate to. Then, reflect on one point about your participation where the authors perhaps "miss" a key point, idea or behavior about how you act online.
create a thread titled with your firstname+pointIRelateTo
create a thread titled with your firstname+pointMissed
respond to at least two of your course mate's threads.