Statistical Analysis in Excel for Environmental Impact
Expert-defined terms from the Professional Certificate in Excel Formulas for Environmental Impact course at London School of International Business. Free to read, free to share, paired with a globally recognised certification pathway.
Analysis of Variance (ANOVA) #
A statistical technique used to compare the means of two or more samples to determine if they are significantly different from each other. It is used to compare the variance within and between groups.
Bar Chart #
A graphical representation of data using bars of varying lengths to compare quantities. In Excel, bar charts can be created using the "Insert" tab and selecting "Column" or "Bar" chart.
Box and Whisker Plot #
A graphical representation of data using a box and whisker to show the distribution of data. The box represents the interquartile range (IQR) and the line in the middle of the box represents the median. The whiskers represent the range of data, not including outliers.
Correlation #
A statistical relationship between two variables where a change in one variable is associated with a change in the other variable. Correlation does not imply causation.
Data Analysis Toolpak #
A set of statistical tools and functions available in Excel that can be used for data analysis. It can be accessed through the "Data" tab and selecting "Data Analysis."
Descriptive Statistics #
A set of statistical measures used to describe and summarize data. Descriptive statistics include measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation).
Histogram #
A graphical representation of data using bars to show the distribution of continuous data. In Excel, histograms can be created using the "Insert" tab and selecting "Histogram" chart.
Hypothesis Testing #
A statistical technique used to test a hypothesis about a population parameter. It involves setting up a null hypothesis and an alternative hypothesis, and using a test statistic and a significance level to determine if the null hypothesis should be rejected.
Linear Regression #
A statistical technique used to model the relationship between two continuous variables. It involves finding the best-fitting line through the data and using the slope and intercept to make predictions about the dependent variable.
Mean #
A measure of central tendency that represents the average value of a set of data. It is calculated by adding up all the values and dividing by the number of values.
Median #
A measure of central tendency that represents the middle value of a set of data. It is calculated by arranging the data in order and finding the value that is in the middle.
Mode #
A measure of central tendency that represents the most frequently occurring value in a set of data.
Normality Test #
A statistical test used to determine if a set of data follows a normal distribution. Examples include the Shapiro-Wilk test and the Kolmogorov-Smirnov test.
Pareto Chart #
A graphical representation of data that combines a bar chart and a line graph to show the relative frequency of different items in a dataset. It is used to identify the most important items in a dataset.
Pie Chart #
A circular graphical representation of data that is divided into slices to show the relative size of different categories.
Population #
The entire group of individuals, items, or events that a study is interested in.
Probability Distribution #
A mathematical function that describes the probability of different outcomes in a random variable.
Quartile #
A measure of dispersion that divides a dataset into four equal parts. The first quartile (Q1) represents the 25th percentile, the second quartile (Q2) represents the median, and the third quartile (Q3) represents the 75th percentile.
Range #
A measure of dispersion that represents the difference between the largest and smallest values in a dataset.
Scatter Plot #
A graphical representation of data that shows the relationship between two continuous variables. In Excel, scatter plots can be created using the "Insert" tab and selecting "Scatter" chart.
Standard Deviation #
A measure of dispersion that represents the average distance of a set of data from the mean. It is calculated by taking the square root of the variance.
T #
Test: A statistical test used to compare the means of two samples. It is used to determine if the difference between the means is statistically significant.
Variance #
A measure of dispersion that represents the average of the squared differences between each value in a dataset and the mean. It is calculated by taking the average of the squared differences between each value and the mean.
Z #
Score: A statistical measure that represents the number of standard deviations a value is from the mean. It is calculated by subtracting the mean from the value and dividing by the standard deviation.
In the context of the course Professional Certificate in Excel Formulas for Envi… #
For example, ANOVA can be used to compare the means of different environmental variables, such as temperature or precipitation, to determine if they are significantly different from each other. Bar charts and histograms can be used to visually represent environmental data, while descriptive statistics can be used to summarize and describe the data. Correlation can be used to determine if there is a relationship between two environmental variables, while linear regression can be used to model the relationship. Normality tests can be used to determine if environmental data follows a normal distribution, which is important for many statistical tests. Pareto charts can be used to identify the most important environmental factors, while pie charts can be used to show the relative contribution of different factors to an overall impact. Quartiles and range can be used to measure the dispersion of environmental data, while standard deviation and variance can be used to measure the spread of the data. Scatter plots can be used to show the relationship between two environmental variables, while t-tests can be used to compare the means of two environmental variables. Z-scores can be used to compare environmental data from different sources or to compare data to a standard or threshold.
Challenge #
Use at least five of the glossary terms to analyze an environmental dataset in Excel. For example, you could use ANOVA to compare the means of different temperature readings, scatter plots to show the relationship between temperature and precipitation, and descriptive statistics to summarize the data. Use the Data Analysis Toolpak to perform the statistical analyses and create visual representations of the data using bar charts, histograms, or pie charts. Interpret the results and explain their implications for environmental impact.