Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Analytics: A Comprehensive Overview, Slides of Information Technology

he content of "Living in the Era" would typically encompass various aspects of contemporary life, including: Societal Changes: Discussions on how society has evolved in terms of values, norms, demographics, and social structures. Technological Advancements: Exploration of the impact of technology on daily life, such as the proliferation of smartphones, social media, artificial intelligence, and automation. Cultural Trends: Examination of current cultural phenomena, including pop culture, fashion, music, art, and entertainment. Global Challenges: Coverage of global issues like climate change, pandemics, geopolitics, and economic developments shaping the present era. Lifestyle and Well-being: Insights into modern lifestyles, health and wellness trends, work-life balance, and personal development.

Typology: Slides

2023/2024

Uploaded on 09/13/2023

shin-rose-2
shin-rose-2 🇵🇭

5 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
LITE MATTERS
2023
ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE
1
LESSON 4.1
BIG DATA ANALYTICS
Data is the fingerprint of creation; and Analytics is the new "Queen of Sciences." Hardly
any human activity, business decision, strategy, or physical entity does not produce data or
involve data analytics to inform it. As a result, data analytics has become core to our endeavors,
from business to medicine, research, management, product development, and all life facets.
From a business perspective, data is now viewed as the new goldand data analytics,
the machinery that mines, molds, and mints it. Data analytics is a set of computer-enabled
analytics methods, processes, and disciplines of extracting and transforming raw data into
meaningful insight, discovery, and knowledge that helps make more effective decisions.
Another definition describes it as the discipline of extracting and analyzing data to deliver new
insights about past performance and current operations and predict future events.
Data analytics is gaining significant prominence not just for improving business
outcomes or operational processes; it certainly is the new tool to improve quality, reduce costs
and improve customer satisfaction. But it's fast becoming necessary for operational,
administrative, and even legal reasons.
Since then, data analytics has come a long way and is gaining popularity thanks to the
eruption of five new SMAC technologies: social media, mobility, analytics, and cloud
computing. You might add another for sensors and the internet of things (IoT). Each
technology is significant in transforming the business and the data they generate.
What is Data?
Data (plural, data: datum), as defined by Merriam-Webster Dictionary, refers to the
following: factual information (such as measurements or statistics) used as a basis for
reasoning, discussion, or calculation, information in digital form that can be transmitted or
processed, or information output by a sensing device or organ that includes both useful and
irrelevant or redundant information and must be processed to be meaningful.
In computing, data is information translated into a form efficient for movement or
processing. Relative to today's computers and transmission media, data is information
converted into binary digital form. It is acceptable for data to be used as a singular subject or
a plural subject. Raw data is a term used to describe data in its most basic digital format.
The concept of data in the context of computing has its roots in the work of Claude
Shannon, an American mathematician known as the father of information theory. He ushered
in binary digital concepts by applying two-value Boolean logic to electronic circuits. Binary
digit formats underlie the CPUs, semiconductor memories, disk drives, and many peripheral
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Data Analytics: A Comprehensive Overview and more Slides Information Technology in PDF only on Docsity!

LESSON 4.

BIG DATA ANALYTICS

Data is the fingerprint of creation; and Analytics is the new "Queen of Sciences." Hardly any human activity, business decision, strategy, or physical entity does not produce data or involve data analytics to inform it. As a result, data analytics has become core to our endeavors, from business to medicine, research, management, product development, and all life facets. From a business perspective, data is now viewed as the new gold—and data analytics, the machinery that mines, molds, and mints it. Data analytics is a set of computer-enabled analytics methods, processes, and disciplines of extracting and transforming raw data into meaningful insight, discovery, and knowledge that helps make more effective decisions. Another definition describes it as the discipline of extracting and analyzing data to deliver new insights about past performance and current operations and predict future events. Data analytics is gaining significant prominence not just for improving business outcomes or operational processes; it certainly is the new tool to improve quality, reduce costs and improve customer satisfaction. But it's fast becoming necessary for operational, administrative, and even legal reasons. Since then, data analytics has come a long way and is gaining popularity thanks to the eruption of five new SMAC technologies: social media, mobility, analytics, and cloud computing. You might add another for sensors and the internet of things (IoT). Each technology is significant in transforming the business and the data they generate. What is Data? Data (plural, data: datum), as defined by Merriam-Webster Dictionary, refers to the following: factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation, information in digital form that can be transmitted or processed, or information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful. In computing, data is information translated into a form efficient for movement or processing. Relative to today's computers and transmission media, data is information converted into binary digital form. It is acceptable for data to be used as a singular subject or a plural subject. Raw data is a term used to describe data in its most basic digital format. The concept of data in the context of computing has its roots in the work of Claude Shannon, an American mathematician known as the father of information theory. He ushered in binary digital concepts by applying two-value Boolean logic to electronic circuits. Binary digit formats underlie the CPUs, semiconductor memories, disk drives, and many peripheral

devices standard in computing today. Early computer input for control and data took punch cards, magnetic tape, and hard disk. Data's importance in business computing became apparent early on by the popularity of the terms "data processing" and "electronic data processing," which, for a time, came to encompass the whole gamut of what is now known as information technology. Over the history of corporate computing, specialization occurred, and a distinct data profession emerged along with the growth of corporate data processing. Recall: How is data stored? Computers represent data, including video, images, sounds, and text, as binary values using patterns of just two numbers: 1 and 0. A bit is the smallest data unit and represents just a single value. Usually, storage and memory are measured in megabytes and gigabytes. The units of data measurement continue to grow as the amount of data collected and stored grows. For example, the relatively new term "brontobyte" is data storage equal to 10 to the 27th^ power of bytes. Data can be stored in file formats, as in mainframe systems using ISAM and VSAM. Other file formats for data storage, conversion, and processing include comma-separated values. These formats continued to find uses across various machine types, even as more structured-data-oriented approaches gained footing in corporate computing. Greater specialization developed as databases, database management systems, and relational database technology arose to organize information. UNIT VALUE 1 byte 8 bits (binary digits) 1 kilobyte 1024 bytes 1 megabyte 1024 kilobytes 1 gigabyte 1024 megabytes 1 terabyte 1024 gigabytes 1 petabyte 1024 terabytes 1 exabyte 1024 petabytes 1 zettabyte 1024 exabytes 1 yottabyte 1024 zettabytes 1 brontobyte 1024 yottabytes Table 1 – Common Data Storage Measurements What is Analytics? Analytics is a broad term that encompasses the processes, technologies, frameworks, and algorithms to extract meaningful insights from data. Raw data does not have a meaning until it is contextualized and processed into useful information. Analytics is the process of extracting and creating information from raw data by filtering, processing, categorizing,

percentage. These statistics help describe patterns in the data and present the data in a summarized form. Thus, descriptive analytics helps summarize the data. Examples are the following: ▪ computing the total number of likes for a particular post ▪ calculating the average monthly rainfall ▪ finding the average number of visitors per month to a website

  1. Diagnostic Analytics Diagnostic analytics comprises analysis of past data to diagnose the reasons why certain events happened. Thus, diagnostic analytics aims to answer the question - Why did it happen? Let us consider an example of a system that collects and analyzes sensor data from machines for monitoring their health and predicting failures. While descriptive analytics can help summarize the data by computing various statistics, diagnostic analytics can provide more insights into why certain faults have occurred based on the patterns in the sensor data for previous defects. 3. Predictive Analytics Predictive analytics comprises predicting the occurrence of an event or the likely outcome of an event or forecasting future values using prediction models. Predictive analytics aims to answer the question - What is likely to happen? The examples where predictive analytics can be used include the following ▪ predicting when a fault will occur in a machine ▪ predicting whether a tumor is benign or malignant ▪ predicting the occurrence of natural emergencies (events such as forest fires or river floods) ▪ forecasting the pollution levels. Predictive analytics is done using predictive models, which are trained by existing data. These models learn patterns and trends from the existing data and predict the occurrence of an event or the likely outcome of an event (classification models) or forecast numbers (regression models). The accuracy of prediction models depends on the quality and volume of the existing data available for training the models, such that all the patterns and trends in the existing data can be learned accurately. Before a model is used for prediction, it must be validated with existing data. The typical approach adopted while developing prediction models is to divide the existing data into training and test data sets (for example, 75% of the data is used for training, and 25% is used for testing the prediction model). 4. Prescriptive Analytics While predictive analytics uses prediction models to predict the likely outcome of an event, prescriptive analytics uses multiple prediction models to predict various outcomes

and the best course of action for each outcome. Prescriptive analytics aims to answer the question - What can we do to make it happen? Prescriptive analytics can predict possible outcomes based on the current choice of actions. Therefore, we can consider prescriptive analytics as a type of analytics that uses different prediction models for different inputs. Prescriptive analytics prescribes actions or the best option from the available options. The examples that illustrate the uses of predictive analytics are the following: ▪ prescribe the best medicine for treating a patient based on the outcomes of various medications for similar patients ▪ to suggest the best mobile data plan for a customer based on the customer's browsing patterns. What is Big Data? Big data is a collection of datasets whose volume, velocity, or variety is so large that storing, managing, processing, and analyzing the data using traditional databases and data processing tools is complex. In recent years, there has been an exponential growth in structured and unstructured data generated by information technology, industrial, healthcare, the Internet of Things, and other systems. According to an estimate by IBM, 2.5 quintillion bytes of data are created every day. The estimated volume of data created worldwide in 2022, according to Statista, is 97 zettabytes, compared to the 79 zettabytes of data generated in 2021. In 2025 the amount generated in 2021 is expected to double. Of all of the data in the world at the moment, approximately 90% of it is replicated, with only 10% being genuine, new data. Based on a report by DOMO, these things happen on the web in just 60 seconds to see the volume and speed at which we create data online.

  • 5.9 million Google searches happen.
  • Instagram users share 66,000 photos.
  • Facebook users post 1.7 million pieces of content.
  • People send 231.4 million emails.
  • YouTubers upload 500 hours of videos.
  • Snapchat users send 4.3 million snaps.
  • Twitter users write 347,200 tweets.
  • People send 16 million texts.
  • Venmo users transfer $437,600.
  • Amazon shoppers spend $443,000. Big Data can power the next generation of smart applications that will leverage the power of the data to make the applications intelligent. Big data applications span a wide range of web, retail and marketing, banking and financial, industrial, healthcare, environmental, Internet of Things, and cyber-physical systems.
  1. Structured Data Any data stored, accessed, and processed in a fixed format is termed 'structured' data. Over time, talent in computer science has achieved tremendous success in developing techniques for working with such kinds of data (where the format is well known in advance) and deriving value from it. However, nowadays, we are foreseeing issues when such data grows to a vast extent; typical sizes are in the range of multiple zettabytes. Data containing a defined data type, format, and structure (transaction data, online analytical processing [OLAP] data cubes, traditional RDBMS, CSV files, and even simple spreadsheets) is an example of structured data.
  2. Semi-structured Data Semi-structured data can contain both forms of data. We can see semi-structured data as a structured form, but it is not defined with, e.g., a table definition in relational DBMS. An example of semi-structured data is data represented in an XML file.
  1. Quasi-structured Data Quasi-structured data refers to the textual data with erratic data formats that can be formatted with effort, tools, and time (for instance, web clickstream data that may contain inconsistencies in data values).
  2. Unstructured Data Any data with an unknown form or structure is classified as unstructured data. In addition to the huge size, unstructured data poses multiple challenges in its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos, etc. Nowadays, organizations have a wealth of data available to them. Still, unfortunately, they don't know how to derive value from it since this data is in its raw form or unstructured format.

Variety refers to the forms of the data. Big data comes in different forms, such as structured, unstructured, or semi-structured, including text data, image, audio, video, and sensor data. Big data systems must be flexible enough to handle such a variety of data.

  1. Veracity Veracity refers to how accurate the data is. The data needs to be cleaned to remove noise to extract the value. Data-driven applications can reap the benefits of big data only when the data is meaningful and accurate. Therefore, data cleansing is important so incorrect and faulty data can be filtered out.
  2. Value The value of data refers to the usefulness of data for its intended purpose. It is related to the veracity or accuracy of the data. The end goal of any big data analytics system is to extract value from the data. For some applications, value also depends on how fast we can process the data. Domain-Specific Examples of Big Data Big data applications span a wide range of domains, including (but not limited to) homes, cities, environment, energy systems, retail, logistics, industry, agriculture, Internet of Things, and healthcare.
    1. Web a. Web Analytics b. Performance Monitoring c. Ad Targeting and Analytics d. Content Recommendation
    2. Financial a. Credit Risk Modeling b. Fraud Detection
    3. Healthcare a. Epidemiological Surveillance b. Patient Similarity-based Decision Intelligence Application c. Adverse Drug Events Prediction d. Detecting Claim Anomalies e. Evidence-based Medicine f. Real-time health monitoring
    4. Internet of Things a. Intrusion Detection b. Smart Parking c. Smart Roads d. Structural Health Monitoring e. Smart Irrigation
    5. Environment a. Weather Monitoring

b. Air Pollution Monitoring c. Noise Pollution Monitoring d. Forest Fire Detection e. River Floods Detection f. Water Quality Monitoring

  1. Logistics and Transportation a. Real-time Fleet Tracking b. Shipment Monitoring c. Remote Vehicle Diagnostics d. Route Generation and Scheduling e. Hyper-local Delivery f. Cab/Taxi Aggregators
  2. Industry a. Machine Diagnosis and Prognosis b. Risk Analysis of Industrial Operations c. Production Planning and Control
  3. Retail a. Inventory Management b. Customer Recommendation c. Store Layout Optimization d. Forecasting Demand How big data analytics works? Big data analytics involves collecting, processing, cleaning, and analyzing large datasets to help organizations operationalize their big data.
  4. Collect data. Data collection looks different for every organization. With today's technology, organizations can gather structured and unstructured data from various sources — from cloud storage to mobile applications to in-store IoT sensors. Some data will be stored in data warehouses where business intelligence tools and solutions can access it easily. Raw or unstructured data that is too diverse or complex for a warehouse may be assigned metadata and stored in a data lake.
  5. Process data. Once data is collected and stored, it must be appropriately organized to get accurate analytical queries, especially when it's large and unstructured. In addition, available data is growing exponentially, making data processing challenging for organizations. One processing option is batch processing, which looks at large data blocks over time. Batch processing is useful when there is a longer turnaround time between collecting and analyzing data.

Data analytics will help businesses streamline operations, save resources, and improve the bottom line. When companies obtain a better idea of what the audience needs, they spend less time producing advertisements that do not meet their desires. Challenges of Big Data Analytics Although big data analytics brings several benefits to a business, its implementation is not always straightforward. First, companies must adopt a data-driven culture and have the necessary tools to collect, process, and analyze data. Here are some challenges organizations might face while adopting big data analytics.

  1. Quality of data. In big data analytics, quality data is everything. Unfortunately, low-quality, duplicate, or inconsistent data sets can lead to many problems, including misinterpretation, poor decision- making, and, ultimately, loss of revenue. Low-quality data can also create involuntary bias in a system. Of course, big data can't be 100% accurate. And it doesn't have to be entirely accurate to be useful. But significantly low-quality data sets will do more harm than good and won't bring valuable insight. Duplicate data can also cause contradictions and spoil your efforts in making decisions requiring utmost accuracy.
  2. Synchronization of data sources. Data is collected from various sources, including social media platforms and company websites. Businesses can also collect customer data using in-store facilities such as Wi-Fi. In addition, retailers like Walmart are known to couple in-store surveillance with computer vision technology to identify the aisles customers visit the most and the least. Most businesses are growing at a rapid pace. It also means that the amount of data they generate is also increasing. Although the data storage part is sorted for a decade or more, data lakes and data warehouses synchronizing data across different data sources can be challenging. Combining data from different sources into a unified view is called data integration and is crucial for deriving valuable insights. Unfortunately, this is one aspect of big data analytics that many companies overlook, leading to logic conflicts and incomplete or inaccurate results.
  3. Organizational resistance. Apart from some technological aspects of big data analytics, adopting a data-driven culture in an organization can be challenging. For example, a 2021 NewVantage Partners Big Data and AI Executive Survey revealed that only 24.4% of the participating companies had forged a data culture within their firms. Lack of understanding, lack of middle management adoption, business resistance, and poor organizational alignment are reasons why companies have yet to adopt a data-driven culture.
  1. Making big data accessible. Collecting and processing data becomes more difficult as the amount of data grows. Therefore, organizations must make data accessible and convenient for data owners of all skill levels.
  2. Maintaining quality data. With so much data to maintain, organizations spend more time scrubbing for duplicates, errors, absences, conflicts, and inconsistencies than ever.
  3. Keeping data secure. As the amount of data grows, so do privacy and security concerns. As a result, organizations will need to strive for compliance and put tight data processes in place before taking advantage of big data.
  4. Finding the right tools and platforms. New technologies for processing and analyzing big data are developed all the time. Organizations must find the right technology to work within their established ecosystems and address their particular needs. Often, the right solution is flexible and can accommodate future infrastructure changes.
  5. Other challenges
  • Lack of talent is a significant challenge company face while integrating big data. Although the number of individuals opting for a career in data science and analytics is steadily increasing, there's still a skill shortage.
  • Data quality maintenance is another issue. Since data comes from multiple sources at high velocity, the time and resources required to manage data quality properly can be significant.