Unit 4 Vocabulary
census
an official count or survey of a population, typically recording various details of individuals
rule
a set way to calculate or solve a problem
mean squared deviation
tells you how close a regression line is to a set of points; is determined by finding the average of the squared differences between your guess and the actual values
mean absolute error
the amount of error in your measurements; it is the difference between the measured value adn the "true" value
trend
often referred to as a line of best fit, is a line that is used to represent the behavior of a set of data to determine if there is a certain pattern
positive association
when the values of one variable tend to increase as the values of the otther variable increase
negative assocation
when the values of one variable tend to decrease as the values of the other variable increase
no association
means that there is no line and all the dots are scattered
shape
describes the distribution (or pattern) of the data within a dataset
linear
used to describe a straight-line relationship between two variables
model
a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)
strength of association
how much two variables covary and the extent to which the INDEPENDENT VARIABLE affects the DEPENDENT VARIABLE
line of best fit
a line through a scatter plot of data points that best expresses the relationship between those points
regression line
a regression line is a line that best describes the behavior of a set of data
observed value
the value that is actually observed (what actually happened)
predicted value
shows the projected equation of the line of best fit
correlation coefficient
a statistical measure that calculates the strength of the relationship between the relative movements of two variables
market
refers to the live streaming of trade-related data; it encompasses a range of information such as price, bid/ask quotes and market volume
non-linear
a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables; the data are fitted by a method of successive approximations
polynomial trends
describes a pattern in data that is curved or breaks from a straight linear trend; it often occurs in a large set of data that contains many fluctuations
classify
is the problem of identifying which of a set of categories (sub-populations) an observation (or observations), belongs to
decision tree
a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance outcomes
Classification and Regression Trees (CART)
a predictive algorithm used in machine leanring; it explains how a target variable's values can be predicted based on other values
nodes
a point of intersection/connection within a data communication network
misclassifications
when a participant is placed into the wrong population or subgroup or category because of some kind of observational or measurement error
clustering
is the process of grouping a set of objects (or people) in such a way that objects (or people) in the same group are more similar to each other than those in other groups
cluster
a group of similar things or people positioned or occurring closely together
k-means
aims to partition data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart
network
a system designed to transfer data from one network access point to one other or more network access points via data switching, transmission lines, and system controls