# Unit 4 Vocabulary

### census

an official count or survey of a population, typically recording various details of individuals

### rule

a set way to calculate or solve a problem

### mean squared deviation

tells you how close a regression line is to a set of points; is determined by finding the average of the squared differences between your guess and the actual values

### mean absolute error

the amount of error in your measurements; it is the difference between the measured value adn the "true" value

### trend

often referred to as a line of best fit, is a line that is used to represent the behavior of a set of data to determine if there is a certain pattern

### positive association

when the values of one variable tend to increase as the values of the otther variable increase

### negative assocation

when the values of one variable tend to decrease as the values of the other variable increase

### no association

means that there is no line and all the dots are scattered

### shape

describes the distribution (or pattern) of the data within a dataset

### linear

used to describe a straight-line relationship between two variables

### model

a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)

### strength of association

how much two variables covary and the extent to which the INDEPENDENT VARIABLE affects the DEPENDENT VARIABLE

### line of best fit

a line through a scatter plot of data points that best expresses the relationship between those points

### regression line

a regression line is a line that best describes the behavior of a set of data

### observed value

the value that is actually observed (what actually happened)

### predicted value

shows the projected equation of the line of best fit

### correlation coefficient

a statistical measure that calculates the strength of the relationship between the relative movements of two variables

### market

refers to the live streaming of trade-related data; it encompasses a range of information such as price, bid/ask quotes and market volume

### non-linear

a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables; the data are fitted by a method of successive approximations

### polynomial trends

describes a pattern in data that is curved or breaks from a straight linear trend; it often occurs in a large set of data that contains many fluctuations

### classify

is the problem of identifying which of a set of categories (sub-populations) an observation (or observations), belongs to

### decision tree

a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance outcomes

### Classification and Regression Trees (CART)

a predictive algorithm used in machine leanring; it explains how a target variable's values can be predicted based on other values

### nodes

a point of intersection/connection within a data communication network

### misclassifications

when a participant is placed into the wrong population or subgroup or category because of some kind of observational or measurement error

### clustering

is the process of grouping a set of objects (or people) in such a way that objects (or people) in the same group are more similar to each other than those in other groups

### cluster

a group of similar things or people positioned or occurring closely together

### k-means

aims to partition data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart

### network

a system designed to transfer data from one network access point to one other or more network access points via data switching, transmission lines, and system controls