Data Analysis and Decision Making
Expert-defined terms from the Professional Certificate in Quality Assurance in Higher Education course at London School of International Business. Free to read, free to share, paired with a globally recognised certification pathway.
Data Analysis and Decision Making Glossary #
Data Analysis and Decision Making Glossary
1. Data Analysis #
Data analysis is the process of inspecting, cleaning, transforming, and modeling… #
It involves the use of statistical and mathematical techniques to analyze and interpret data.
- Descriptive Statistics: Descriptive statistics are used to summarize and descr… #
Examples include mean, median, mode, range, and standard deviation.
- Inferential Statistics: Inferential statistics are used to make predictions or… #
- Inferential Statistics: Inferential statistics are used to make predictions or inferences about a population based on a sample of data.
- Data Visualization: Data visualization is the graphical representation of data… #
- Data Visualization: Data visualization is the graphical representation of data to help users understand complex data patterns and trends.
2. Decision Making #
Decision making is the process of choosing between alternative courses of action… #
It involves evaluating options and selecting the most suitable one to achieve a specific goal or outcome.
- Decision Support Systems: Decision support systems are computer-based tools th… #
- Decision Support Systems: Decision support systems are computer-based tools that help decision-makers analyze data and information to make informed decisions.
- Risk Management: Risk management involves identifying, assessing, and prioriti… #
- Risk Management: Risk management involves identifying, assessing, and prioritizing risks to minimize their impact on decision-making processes.
- Cost-Benefit Analysis: Cost-benefit analysis is a technique used to compare th… #
- Cost-Benefit Analysis: Cost-benefit analysis is a technique used to compare the costs of a decision with the benefits it will provide.
3. Regression Analysis #
Regression analysis is a statistical technique used to examine the relationship… #
It helps in understanding how the value of the dependent variable changes when one or more independent variables are varied.
- Linear Regression: Linear regression is a type of regression analysis where th… #
- Linear Regression: Linear regression is a type of regression analysis where the relationship between the dependent variable and independent variable(s) is modeled as a linear equation.
- Multiple Regression: Multiple regression is a form of regression analysis that… #
- Multiple Regression: Multiple regression is a form of regression analysis that examines the relationship between one dependent variable and two or more independent variables.
- Logistic Regression: Logistic regression is a regression analysis technique us… #
g., yes/no, 0/1).
4. Hypothesis Testing #
Hypothesis testing is a statistical method used to make inferences about a popul… #
It involves formulating a null hypothesis and an alternative hypothesis, collecting data, and performing statistical tests to determine whether the null hypothesis should be rejected.
- Type I Error: Type I error occurs when the null hypothesis is incorrectly reje… #
It is also known as a false positive.
- Type II Error: Type II error occurs when the null hypothesis is incorrectly ac… #
It is also known as a false negative.
- Significance Level: The significance level is the probability of rejecting the… #
It is denoted by alpha (α) and is typically set at 0.05.
5. Cluster Analysis #
Cluster analysis is a data mining technique used to group a set of objects in su… #
It helps in identifying patterns and relationships in data.
- K-means Clustering: K-means clustering is a popular clustering algorithm that… #
- K-means Clustering: K-means clustering is a popular clustering algorithm that partitions data into k clusters based on their centroids.
- Hierarchical Clustering: Hierarchical clustering is an alternative clustering… #
- Hierarchical Clustering: Hierarchical clustering is an alternative clustering method that creates a tree of clusters, known as a dendrogram, to represent the relationships between data points.
- Density-Based Clustering: Density-based clustering is a method that groups tog… #
- Density-Based Clustering: Density-based clustering is a method that groups together points that are closely packed in a high-density region.
6. Time Series Analysis #
Time series analysis is a statistical technique used to analyze and interpret ti… #
It involves studying the patterns, trends, and cycles in the data to make forecasts and predictions.
- Autocorrelation: Autocorrelation is a measure of the correlation between value… #
- Autocorrelation: Autocorrelation is a measure of the correlation between values of a time series at different points in time.
- Moving Average: A moving average is a technique used to smooth out fluctuation… #
- Moving Average: A moving average is a technique used to smooth out fluctuations in time series data by calculating the average of a subset of data points.
- Seasonal Decomposition: Seasonal decomposition is a method used to separate a… #
- Seasonal Decomposition: Seasonal decomposition is a method used to separate a time series into its trend, seasonal, and residual components.
7. Data Mining #
Data mining is the process of discovering patterns, trends, and insights in larg… #
It helps in extracting valuable information from data for decision-making purposes.
- Association Rules: Association rules are if-then statements that identify rela… #
- Association Rules: Association rules are if-then statements that identify relationships between variables in a dataset.
- Classification: Classification is a data mining technique used to predict the… #
- Classification: Classification is a data mining technique used to predict the class or category of a new observation based on training data.
- Clustering: Clustering is a data mining technique used to group similar object… #
- Clustering: Clustering is a data mining technique used to group similar objects together based on their characteristics.
8. Forecasting #
Forecasting is the process of making predictions about future events based on hi… #
It involves analyzing past patterns and using them to estimate future outcomes.
- Time Series Forecasting: Time series forecasting is a specific type of forecas… #
- Time Series Forecasting: Time series forecasting is a specific type of forecasting that involves predicting future values of a time series based on historical data.
- Exponential Smoothing: Exponential smoothing is a technique used in time serie… #
- Exponential Smoothing: Exponential smoothing is a technique used in time series forecasting to give more weight to recent observations.
- ARIMA Models: ARIMA (AutoRegressive Integrated Moving Average) models are a cl… #
- ARIMA Models: ARIMA (AutoRegressive Integrated Moving Average) models are a class of models used for time series analysis and forecasting.
9. Data Quality #
Data quality refers to the accuracy, completeness, consistency, and reliability… #
It is essential for ensuring that data analysis and decision-making processes are based on high-quality, trustworthy data.
- Data Cleansing: Data cleansing is the process of detecting and correcting erro… #
- Data Cleansing: Data cleansing is the process of detecting and correcting errors and inconsistencies in data to improve its quality.
- Data Governance: Data governance is the overall management of the availability… #
- Data Governance: Data governance is the overall management of the availability, usability, integrity, and security of data within an organization.
- Data Profiling: Data profiling is the process of analyzing data to gain an und… #
- Data Profiling: Data profiling is the process of analyzing data to gain an understanding of its structure, content, and quality.
10. Data Visualization #
Data visualization is the graphical representation of data to help users underst… #
It involves creating visualizations such as charts, graphs, and maps to communicate insights effectively.
- Bar Chart: A bar chart is a graphical representation of data where bars of var… #
- Bar Chart: A bar chart is a graphical representation of data where bars of varying lengths are used to show the values of different categories.
- Scatter Plot: A scatter plot is a graphical representation of data points on a… #
- Scatter Plot: A scatter plot is a graphical representation of data points on a two-dimensional plane to show the relationship between two variables.
- Heat Map: A heat map is a graphical representation of data where values are re… #
- Heat Map: A heat map is a graphical representation of data where values are represented by colors to show patterns and trends.
11. Statistical Analysis #
Statistical analysis is the process of collecting, exploring, analyzing, and int… #
It involves applying statistical techniques to draw meaningful insights from data.
- Central Limit Theorem: The central limit theorem states that the sampling dist… #
- Central Limit Theorem: The central limit theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
- Confidence Interval: A confidence interval is a range of values within which t… #
- Confidence Interval: A confidence interval is a range of values within which the true population parameter is likely to lie with a certain level of confidence.
- Hypothesis Testing: Hypothesis testing is a statistical method used to make in… #
- Hypothesis Testing: Hypothesis testing is a statistical method used to make inferences about a population based on sample data.
12. Machine Learning #
Machine learning is a branch of artificial intelligence that focuses on the deve… #
It involves training models on data to make decisions without being explicitly programmed.
- Supervised Learning: Supervised learning is a machine learning technique where… #
- Supervised Learning: Supervised learning is a machine learning technique where the model is trained on labeled data to make predictions.
- Unsupervised Learning: Unsupervised learning is a machine learning technique w… #
- Unsupervised Learning: Unsupervised learning is a machine learning technique where the model is trained on unlabeled data to find patterns and relationships.
- Deep Learning: Deep learning is a subset of machine learning that uses neural… #
- Deep Learning: Deep learning is a subset of machine learning that uses neural networks with multiple layers to learn complex patterns in data.
13. Exploratory Data Analysis #
Exploratory data analysis is an approach to analyzing data sets to summarize the… #
It helps in understanding the underlying structure, patterns, and relationships in data.
- Box Plot: A box plot is a graphical representation of the distribution of a da… #
- Box Plot: A box plot is a graphical representation of the distribution of a dataset that includes the median, quartiles, and outliers.
- Histogram: A histogram is a graphical representation of the frequency distribu… #
- Histogram: A histogram is a graphical representation of the frequency distribution of a dataset where bars of varying heights represent the frequency of data points.
- Correlation Analysis: Correlation analysis is a statistical method used to mea… #
- Correlation Analysis: Correlation analysis is a statistical method used to measure the strength and direction of the relationship between two variables.
14. Data Preprocessing #
Data preprocessing is the process of cleaning, transforming, and preparing raw d… #
It involves handling missing values, removing outliers, and standardizing data to ensure its quality and suitability for analysis.
- Feature Scaling: Feature scaling is a technique used to standardize the range… #
- Feature Scaling: Feature scaling is a technique used to standardize the range of independent variables in a dataset to ensure equal importance during analysis.
- Dimensionality Reduction: Dimensionality reduction is a technique used to redu… #
- Dimensionality Reduction: Dimensionality reduction is a technique used to reduce the number of input variables in a dataset while preserving as much information as possible.
- Data Imputation: Data imputation is the process of filling in missing values i… #
- Data Imputation: Data imputation is the process of filling in missing values in a dataset using statistical methods or machine learning algorithms.
15. Data Warehousing #
Data warehousing is the process of collecting, storing, and managing large volum… #
It involves integrating data from multiple sources into a centralized repository for analysis.
- Extract, Transform, Load (ETL): ETL is a process used to extract data from sou… #
- Extract, Transform, Load (ETL): ETL is a process used to extract data from source systems, transform it into a suitable format, and load it into a data warehouse.
- Data Mart: A data mart is a subset of a data warehouse that is designed for a… #
- Data Mart: A data mart is a subset of a data warehouse that is designed for a specific department or business unit.
- Data Warehouse Architecture: Data warehouse architecture refers to the design… #
- Data Warehouse Architecture: Data warehouse architecture refers to the design and structure of a data warehouse, including data storage, processing, and access layers.
16. Big Data Analytics #
Big data analytics is the process of examining large and complex datasets to unc… #
It involves using advanced analytics techniques to extract value from massive volumes of data.
- Hadoop: Hadoop is an open-source framework for distributed storage and process… #
- Hadoop: Hadoop is an open-source framework for distributed storage and processing of big data across clusters of computers.
- MapReduce: MapReduce is a programming model used for processing and generating… #
- MapReduce: MapReduce is a programming model used for processing and generating large datasets in parallel across distributed systems.
- Data Lake: A data lake is a storage repository that holds a vast amount of raw… #
- Data Lake: A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analysis.
17. Text Mining #
Text mining is a data mining technique used to extract valuable information, pat… #
It involves analyzing text documents to discover trends, sentiment, and relationships.
- Natural Language Processing (NLP): Natural Language Processing is a branch of… #
- Natural Language Processing (NLP): Natural Language Processing is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language.
- Sentiment Analysis: Sentiment analysis is a text mining technique used to dete… #
- Sentiment Analysis: Sentiment analysis is a text mining technique used to determine the sentiment expressed in a piece of text, such as positive, negative, or neutral.
- Topic Modeling: Topic modeling is a text mining technique used to identify top… #
- Topic Modeling: Topic modeling is a text mining technique used to identify topics or themes in a collection of text documents.
18. Data Governance #
Data governance is the overall management of the availability, usability, integr… #
It involves establishing policies, processes, and controls to ensure that data is managed effectively and in compliance with regulations.
- Data Stewardship: Data stewardship is the role responsible for managing and en… #
- Data Stewardship: Data stewardship is the role responsible for managing and ensuring the quality and security of data within an organization.
- Data Quality Management: Data quality management is the process of defining, m… #
- Data Quality Management: Data quality management is the process of defining, monitoring, and improving the quality of data to ensure its accuracy and reliability.
- Data Privacy: Data privacy refers to the protection of personal information an… #
- Data Privacy: Data privacy refers to the protection of personal information and sensitive data from unauthorized access or disclosure.
19. Data Security #
Data security is the practice of protecting data from unauthorized access, use,… #
It involves implementing security measures to ensure the confidentiality, integrity, and availability of data.
- Encryption: Encryption is the process of converting data into a coded format t… #
- Encryption: Encryption is the process of converting data into a coded format that can only be decoded with a key or password.
- Access Control: Access control is the process of restricting access to data ba… #
- Access Control: Access control is the process of restricting access to data based on user authentication, authorization, and permissions.
- Data Breach: A data breach is an incident where sensitive or confidential data… #
- Data Breach: A data breach is an incident where sensitive or confidential data is accessed, stolen, or exposed without authorization.
20. Data Ethics #
Data ethics refers to the moral principles and guidelines governing the collecti… #
It involves ensuring that data is handled responsibly, ethically, and in compliance with legal and regulatory requirements.
- Privacy by Design: Privacy by Design is a framework that promotes embedding pr… #
- Privacy by Design: Privacy by Design is a framework that promotes embedding privacy and data protection considerations into the design and operation of systems, products, and services.
- Fairness in Machine Learning: Fairness in machine learning involves ensuring t… #
- Fairness in Machine Learning: Fairness in machine learning involves ensuring that algorithms are unbiased and do not discriminate against individuals based on protected characteristics.
- Data Anonymization: Data anonymization is the process of removing or encryptin… #
- Data Anonymization: Data anonymization is the process of removing or encrypting personally identifiable information from datasets to protect the privacy of individuals.
21. Data Governance Framework #
A data governance framework is a structured approach to managing and controlling… #
It includes policies, procedures, roles, and responsibilities to ensure that data is managed effectively and securely.
- Data Governance Council: A data governance council is a group of stakeholders… #
- Data Governance Council: A data governance council is a group of stakeholders responsible for setting data governance policies, guidelines, and priorities within an organization.
- Data Governance Maturity Model: A data governance maturity model is a framewor… #
- Data Governance Maturity Model: A data governance maturity model is a framework used to assess the effectiveness and maturity of an organization's data governance practices.
- Data Governance Tools: Data governance tools are software applications used to… #
- Data Governance Tools: Data governance tools are software applications used to support and automate data governance processes, such as data quality, metadata management, and compliance.
22. Data Visualization Tools #
Data visualization tools are software applications used to create graphical repr… #
They include a variety of charts, graphs, and dashboards for visualizing data.
- Tableau: Tableau is a popular data visualization tool that allows users to cre… #
- Tableau: Tableau is a popular data visualization tool that allows users to create interactive and shareable dashboards, reports, and visualizations.
- Power BI: Power BI is a business analytics tool by Microsoft that enables user… #
- Power BI: Power BI is a business analytics tool by Microsoft that enables users to visualize and share insights from their data through interactive dashboards and reports.
- Data Studio: Data Studio is a free data visualization tool by Google that allo… #
- Data Studio: Data Studio is a free data visualization tool by Google that allows users to create custom reports and dashboards using data from various sources.
23. Data #
driven Decision Making:
Data #
driven decision making is an approach to making decisions based on data analysis and evidence rather than intuition or personal judgment. It involves using data to inform and support decision-making processes.
- Business Intelligence: Business intelligence is the use of data analysis tools… #
- Business Intelligence: Business intelligence is the use of data analysis tools and techniques to transform data into actionable insights for making informed business decisions.
- Key Performance Indicators (KPIs): Key Performance Indicators are quantifiable… #
- Key Performance Indicators (KPIs): Key Performance Indicators are quantifiable metrics used to evaluate the success of an organization, project, or process.
- Data-driven Culture: A data-driven culture is an organizational mindset that v… #
- Data-driven Culture: A data-driven culture is an organizational mindset that values and prioritizes data-driven decision making across all levels of the organization.
24. Data Warehouse #
A data warehouse is a centralized repository that stores large volumes of struct… #
It is designed to support decision-making processes by providing a single source of truth for data.
- Data Mart: A data mart is a subset of a data warehouse that is designed for a… #
- Data Mart: A data mart is a subset of a data warehouse that is designed for a specific department or business unit within an organization.
- Data Warehouse Architecture: Data warehouse architecture refers to the design… #
- Data Warehouse Architecture: Data warehouse architecture refers to the design and structure of a data warehouse, including data storage, processing, and access layers.
- Data Warehouse Schema: A data warehouse schema is the logical structure that d… #
- Data Warehouse Schema: A data warehouse schema is the logical structure that defines how data is organized and stored in a data warehouse.
25. Data Mining Techniques #
Data mining techniques are methods and algorithms used to extract patterns, tren… #
They include a variety of statistical and machine learning techniques for analyzing and interpreting data.
- Association Rule Mining: Association rule mining is a technique used to discov… #
- Association Rule Mining: Association rule mining is a technique used to discover relationships between variables in a dataset.
- Clustering: Clustering is a data mining technique used to group similar object… #
- Clustering: Clustering is a data mining technique used to group similar objects together based on their characteristics.
- Classification: Classification is a data mining technique used to predict the… #
- Classification: Classification is a data mining technique used to predict the class or category of a new observation based on training data.
26. Data Integration #
Data integration is the process of combining data from different sources and for… #
It involves transforming and harmonizing data to ensure consistency and accuracy across the organization.
- Extract, Transform, Load (ETL): ETL is a process used to extract data from sou… #
- Extract, Transform, Load (ETL): ETL is a process used to extract data from source systems, transform it into a suitable format, and load it into a target database.
- Data Migration: Data migration is the process of moving data from one system o… #
- Data Migration: Data migration is the process of moving data from one system or platform to another while maintaining its integrity and consistency.
- Master Data Management: Master Data Management is a process that ensures the u… #
- Master Data Management: Master Data Management is a process that ensures the uniformity, accuracy, and consistency of an organization's critical data assets.
27. Data Architecture #
Data architecture is the design and structure of data assets within an organizat… #
It involves defining data models, standards, and policies to ensure that data is managed effectively.
- Data Model: A data model is a visual representation of how data is organized a… #
- Data Model: A data model is a visual representation of how data is organized and stored within a database or data warehouse.
- Data Dictionary: A data dictionary is a central repository that defines and de… #
- Data Dictionary: A data dictionary is a central repository that defines and describes the data elements, attributes, and relationships in a database.
- Data Governance Framework: A data governance framework is a structured approac… #
- Data Governance Framework: A data governance framework is a structured approach to managing and controlling data assets within an organization.
28. Data Mining Software #
Data mining software is a type of application that enables users to extract patt… #
It includes tools for data preparation, modeling, and visualization to support data mining activities.
- RapidMiner: RapidMiner is an open-source data science platform that offers a w… #
- RapidMiner: RapidMiner is an open-source data science platform that offers a wide range of tools for data preparation, machine learning, and predictive analytics.
- KNIME: KNIME is an open-source data analytics platform that allows users to cr… #
- KNIME: KNIME is an open-source data analytics platform that allows users to create visual workflows for data mining, analysis, and reporting.
- Weka: Weka is a popular machine learning software that provides tools for data… #
- Weka: Weka is a popular machine learning software that provides tools for data preprocessing, classification, clustering, and regression analysis.
29. Data Modeling #
Data modeling is the process of creating a visual representation of data structu… #
It helps in defining and organizing data to support business requirements.
- Entity-Relationship Diagram (ERD): An entity- #
- Entity-Relationship Diagram (ERD): An entity-