Skip to content

Unit 1 Vocabulary

algorithm

a process or set of rules for solving a mathematical problem

bimodal

a distribution which has two peaks

bin widths

the width of the rectangle with shows data is graphed in groups on the x-axis

bin(s)

a bar whose height corresponds to how many data points are in that bin

campaign

gather and collect data

categorical variables

values that have words

center

useful for numerical variables, the center of the distribution often corresponds to our notion of ‘typical value’

claim

a statement of something

collect

the process of gathering and measuring information

columns

a structured data item in a table

conditional relative frequency

the ratio of a joint relative frequency and related marginal relative frequency

console

a pane within RStudio; the place where RStudio is waiting for you to tell it what to do, and where it will show the results of a command; you type your codes directly into the console

data

Data are information, or observations, that have been gathered and recorded

data analysis

tables, graphs, and summaries of the data that are produced to help us find patterns and relationships

data collection

the process of observing and recording data, or of examining previously collected data to make sure it meets the needs of an investigation

data cycle

a guide we can use when learning to think about data

data interpretation

the statistical questions are answered by referring to the tables, graphs, and summaries made in the Data Analysis phase

data point

a single fact or piece of information

dataset(s)

a collection of data

data table

arrangement of data

data trails

the data collected about us as individuals that could be used to see the patterns in our personal lives

distribution

a function or a listing which shows all the possible values

dotplot

a graphical display of data using dots

environment

a pane within RStudio; where values and objects can be viewed

ethics

a code of behavior, specifically what is right and wrong

evaluate

to think carefully

frequency

the number of times an outcome occurs

GPS

stands for Global Positioning System; it is a radio navigation system that allows land, sea, and airborne users to determine their exact location

grouping

when the data are split into categories

histogram

an approximate representation of the distribution of numerical data

images

a representation of the external form of a person, thing, or picture

input

the value you place into the algorithm

joint (relative) frequency

a fraction that tells you how many members of a group have a particular characteristic

left-hand rule

when multiple data points can appear in more than one bin, observations would go in the bin on the left-hand side

left-skewed

the mean is typically less than the median; the tail of the distribution is longer on the left-hand side than on the right-hand side

marginal (relative) frequency

the margins on the table that show the cells with the initial total counts

maximum

the largest value

minimum

the smallest value

numerical variables

values that have numbers

observations

Data that have been gathered and recorded

organize

the method of classifying and organizing data sets to make them more useful

output

the value(s) that are produced by an algorithm

pane

a rectangular area within RStudio

participatory sensing

an approach to data collection and interpretation in which individuals, acting alone or in groups, use their personal mobile devices and web services to explore interesting aspects of their worlds ranging from health to culture

photo ethics

the principles that guide how we take and share photographs

plot

a pane within RStudio; where plots/graphs/visualizations will be generated

preview

a pane within RStudio; (spreadsheet) - where they will be able to see the variables and observations (index); rows and columns of data

privacy

the right of individuals to have control over how their personal information is collected and used

range

the largest value minus the smallest value

record

a collection of data

rectangular or spreadsheet format

information that is stored in a rectangular or spreadsheet format

representations

the form in which data are stored, processed, and transmitted

right-hand rule

when multiple data points can appear in more than one bin, observations would go in the bin on the right-hand side

right-skewed

the mean is typicallygreater than the median; the tail of the distribution is longer on the right-hand side than on the left-hand side

rows

a structured data item in a table

scatterplot

a plot that uses dots to represent values for two different numeric variables

shape

the placement of points in a distribution

side-by-side bar plot

a plot where the bars are split into colored bar segments, used to compare things between different groups or to track changes over time

spread

how dense the distribution is at certain values

statistical investigative questions

questions that address variability and can be answered with data

surveys

a research method used for collecting data to gain information and insights into various topics of interest

symmetric

a type of distribution where the left side of the distribution mirrors the right side

two-way frequency table

a table that displays the data that pertains to two categories from one group

typical

“mean” or “average”; expected values

unimodal

a distribution which has a single peak

variability

how spread out a set of data is; variability gives you a way to describe how much data sets vary and allows you to compare your data to other sets of data

variables

characteristics of an object or person

visualization

a picture of the data

x-axis

horizontal axis of a coordinate plane

y-axis

vertical axis of a coordinate plane