IDS Unit 3: Essential Concepts
Data beat anecdotes. In science, we need to closely examine the quality of evidence in order to make sound conclusions. Anecdotes can contain personal bias, might be carefully selected to represent a particular point of view, and, in general, may be completely different from the general trend.
Science is often concerned with the question "What causes things to happen?". To answer this, controlled experiments are required. Controlled experiments have several key features: (1) there is a treatment variable and a response variable, and we wish to see if the treatment causes a change that we can measure with the response variable; (2) There is a comparison/control group; (3) Subjects are assigned randomly to treatment or control (randomized assignment); (4) Subjects are not aware of which group they are in (a 'blind'). This may require the use of a placebo for those in the control group; and (5) those who measure the response variable do not know which group the subjects were in (if both 4 and 5 are satisfied, this is a 'double blind' experiment).
Randomized assignment is required to determine cause-and-effect.
Designing an experiment requires making many decisions, including what to measure and how to measure it.
Designing and carrying out an experiment helps us answer specific statistical questions of interest.
Observational studies are those for which there is no intervention applied by researchers.
Experiments are not always possible because of various factors such as ethics, cost limitations, and feasibility.
Confounding factors/variables make it difficult to determine a cause-and-effect relation between two variables.
Surveys ask simple, straightforward questions in order to collect data that can be used to answer statistical questions. Writing such questions can be hard (but fun)!
Another popular data collection method involves collecting data from a random sample of people or objects. Percentages based on random samples tend to ‘center’ on the population parameter value.
Statistics vary from sample to sample. If the typical value across many samples is equal to the population parameter, the statistic is 'unbiased.' Bias means that we tend to “miss the mark.” If we don't do random sampling, we can get biased estimates.
Bias concerning survey sampling includes identifying sampling methods that may lead to biased samples, recognizing potential over- or under-representation in samples, and acquiring skills to choose more reliable sampling techniques.
There is uncertainty when we estimate population parameters. Because of this, it is better to give a range of plausible values, rather than a single value.
The margin of error expresses our uncertainty in an estimate. The estimate, plus or minus the margin of error, gives us an interval in which we are very confident the true value lies.
Sensors are another data collection method. Unlike what we have seen so far, sensors do not involve humans (much). They collect data according to an algorithm.
A key feature that distinguishes the way sensors collect data from more traditional approaches is that sensors collect data when a 'trigger' event occurs. In Participatory Sensing, this event is something we humans agree upon beforehand. Every time that trigger happens, we collect data.
Creating a Participatory Sensing Campaign requires that survey questions must be completed whenever they are “triggered”. Research questions provide an overall direction in a Participatory Sensing Campaign.
Statistical investigative questions guide a Participatory Sensing Campaign so that we can learn about a community or ourselves. These Campaigns should be evaluated before implementing to make sure they are reasonable and ethically sound.
Practicing data collection prior to implementation allows optimization of a Participatory Sensing Campaign.
Stretching the conception of data involves seeing that many web pages present information that can be turned into data.
XML is a programming language that we use with our campaigns. We create basic XML "tags" in the code, which help us store data in a format we understand.
Converting XML to spreadsheet format helps us better understand and view our data.