Unit 1 Vocabulary
algorithm
a process or set of rules for solving a mathematical problem
bimodal
a distribution which has two peaks
bin widths
the width of the rectangle with shows data is graphed in groups on the x-axis
bin(s)
a bar whose height corresponds to how many data points are in that bin
campaign
gather and collect data
categorical variables
values that have words
center
useful for numerical variables, the center of the distribution often corresponds to our notion of ‘typical value’
claim
a statement of something
collect
the process of gathering and measuring information
columns
a structured data item in a table
conditional relative frequency
the ratio of a joint relative frequency and related marginal relative frequency
console
a pane within RStudio; the place where RStudio is waiting for you to tell it what to do, and where it will show the results of a command; you type your codes directly into the console
data
Data are information, or observations, that have been gathered and recorded
data analysis
tables, graphs, and summaries of the data that are produced to help us find patterns and relationships
data collection
the process of observing and recording data, or of examining previously collected data to make sure it meets the needs of an investigation
data cycle
a guide we can use when learning to think about data
data interpretation
the statistical questions are answered by referring to the tables, graphs, and summaries made in the Data Analysis phase
data point
a single fact or piece of information
dataset(s)
a collection of data
data table
arrangement of data
data trails
the data collected about us as individuals that could be used to see the patterns in our personal lives
distribution
a function or a listing which shows all the possible values
dotplot
a graphical display of data using dots
environment
a pane within RStudio; where values and objects can be viewed
ethics
a code of behavior, specifically what is right and wrong
evaluate
to think carefully
frequency
the number of times an outcome occurs
GPS
stands for Global Positioning System; it is a radio navigation system that allows land, sea, and airborne users to determine their exact location
grouping
when the data are split into categories
histogram
an approximate representation of the distribution of numerical data
images
a representation of the external form of a person, thing, or picture
input
the value you place into the algorithm
joint (relative) frequency
a fraction that tells you how many members of a group have a particular characteristic
left-hand rule
when multiple data points can appear in more than one bin, observations would go in the bin on the left-hand side
left-skewed
the mean is typically less than the median; the tail of the distribution is longer on the left-hand side than on the right-hand side
marginal (relative) frequency
the margins on the table that show the cells with the initial total counts
maximum
the largest value
minimum
the smallest value
numerical variables
values that have numbers
observations
Data that have been gathered and recorded
organize
the method of classifying and organizing data sets to make them more useful
output
the value(s) that are produced by an algorithm
pane
a rectangular area within RStudio
participatory sensing
an approach to data collection and interpretation in which individuals, acting alone or in groups, use their personal mobile devices and web services to explore interesting aspects of their worlds ranging from health to culture
photo ethics
the principles that guide how we take and share photographs
plot
a pane within RStudio; where plots/graphs/visualizations will be generated
preview
a pane within RStudio; (spreadsheet) - where they will be able to see the variables and observations (index); rows and columns of data
privacy
the right of individuals to have control over how their personal information is collected and used
range
the largest value minus the smallest value
record
a collection of data
rectangular or spreadsheet format
information that is stored in a rectangular or spreadsheet format
representations
the form in which data are stored, processed, and transmitted
right-hand rule
when multiple data points can appear in more than one bin, observations would go in the bin on the right-hand side
right-skewed
the mean is typicallygreater than the median; the tail of the distribution is longer on the right-hand side than on the left-hand side
rows
a structured data item in a table
scatterplot
a plot that uses dots to represent values for two different numeric variables
shape
the placement of points in a distribution
side-by-side bar plot
a plot where the bars are split into colored bar segments, used to compare things between different groups or to track changes over time
spread
how dense the distribution is at certain values
statistical investigative questions
questions that address variability and can be answered with data
surveys
a research method used for collecting data to gain information and insights into various topics of interest
symmetric
a type of distribution where the left side of the distribution mirrors the right side
two-way frequency table
a table that displays the data that pertains to two categories from one group
typical
“mean” or “average”; expected values
unimodal
a distribution which has a single peak
variability
how spread out a set of data is; variability gives you a way to describe how much data sets vary and allows you to compare your data to other sets of data
variables
characteristics of an object or person
visualization
a picture of the data
x-axis
horizontal axis of a coordinate plane
y-axis
vertical axis of a coordinate plane