Skip to content

Unit 4 Vocabulary


an official count or survey of a population, typically recording various details of individuals


a set way to calculate or solve a problem

mean squared deviation

tells you how close a regression line is to a set of points; is determined by finding the average of the squared differences between your guess and the actual values

mean absolute error

the amount of error in your measurements; it is the difference between the measured value adn the "true" value


often referred to as a line of best fit, is a line that is used to represent the behavior of a set of data to determine if there is a certain pattern

positive association

when the values of one variable tend to increase as the values of the otther variable increase

negative assocation

when the values of one variable tend to decrease as the values of the other variable increase

no association

means that there is no line and all the dots are scattered


describes the distribution (or pattern) of the data within a dataset


used to describe a straight-line relationship between two variables


a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)

strength of association

how much two variables covary and the extent to which the INDEPENDENT VARIABLE affects the DEPENDENT VARIABLE

line of best fit

a line through a scatter plot of data points that best expresses the relationship between those points

regression line

a regression line is a line that best describes the behavior of a set of data

observed value

the value that is actually observed (what actually happened)

predicted value

shows the projected equation of the line of best fit

correlation coefficient

a statistical measure that calculates the strength of the relationship between the relative movements of two variables


refers to the live streaming of trade-related data; it encompasses a range of information such as price, bid/ask quotes and market volume


a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables; the data are fitted by a method of successive approximations

describes a pattern in data that is curved or breaks from a straight linear trend; it often occurs in a large set of data that contains many fluctuations


is the problem of identifying which of a set of categories (sub-populations) an observation (or observations), belongs to

decision tree

a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance outcomes

Classification and Regression Trees (CART)

a predictive algorithm used in machine leanring; it explains how a target variable's values can be predicted based on other values


a point of intersection/connection within a data communication network


when a participant is placed into the wrong population or subgroup or category because of some kind of observational or measurement error


is the process of grouping a set of objects (or people) in such a way that objects (or people) in the same group are more similar to each other than those in other groups


a group of similar things or people positioned or occurring closely together


aims to partition data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart


a system designed to transfer data from one network access point to one other or more network access points via data switching, transmission lines, and system controls