Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Applied Statistics Projects and Data Analysis, Assignments of Sports Law

International University - VNU-HCM Sports Law

Detailed project descriptions and tasks for students in an applied statistics course. The projects involve data preprocessing, visualization, descriptive statistics, and modeling using various methods. The datasets include house prices, student performance, diets, flights, salaries, insurance, and supermarket sales. The goal is to gain insights and draw conclusions from the data.

Typology: Assignments

2021/2022

Uploaded on 04/13/2024

phuc-phan-3 🇻🇳

1 document

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Course: Applied Statistics

Projects

Bui Anh Tuan

June 7, 2023

Overview

Due: Session 12.

Details

The class will be divided into groups. Each group, with 5 to 6 students, will be assigned

a topic to study and present in the class. The objective of this assessment is to encourage

students in doing research in groups and communicate their results in an oral presentation.

Presentation should be created using PowerPoint and should address:

1. Overview of the dataset, why would we investigate this topic.

2. Basic insights from the data using plots and descriptive statistics.

3. Models and results

4. Conclusion.

Presentations should generally not exceed 15 minutes, to allow time for questions and

discussion.

Marking criteria and standards

The presenters will be evaluated by the lecturer (50%) as well as the rest of the class (50%)

based on the following criteria:

i. Content: Is the presentation clear and focused? Does it cover all important content of

the assigned topic?

ii. Preparation: How well prepared is this group? How good are the slides and supporting

materials? How well does this group know their materials?

iii. Presentation and Communication: How well organized is the presentation? How

effectively does this group present, interact and involve the rest of the class? Does this

group use time effectively?

iv. Addressing questions: How effective does this group deal with questions and com-

ments?

v. Interest and Creativity: How interesting and creative is this group presentation?

1

Partial preview of the text

Download Applied Statistics Projects and Data Analysis and more Assignments Sports Law in PDF only on Docsity!

Course: Applied Statistics

Projects

Bui Anh Tuan

June 7, 2023

Overview

Due: Session 12.

Details

The class will be divided into groups. Each group, with 5 to 6 students, will be assigned a topic to study and present in the class. The objective of this assessment is to encourage students in doing research in groups and communicate their results in an oral presentation. Presentation should be created using PowerPoint and should address:

Overview of the dataset, why would we investigate this topic.
Basic insights from the data using plots and descriptive statistics.
Models and results
Conclusion.

Presentations should generally not exceed 15 minutes, to allow time for questions and discussion.

Marking criteria and standards

The presenters will be evaluated by the lecturer (50%) as well as the rest of the class (50%) based on the following criteria: i. Content: Is the presentation clear and focused? Does it cover all important content of the assigned topic? ii. Preparation: How well prepared is this group? How good are the slides and supporting materials? How well does this group know their materials? iii. Presentation and Communication: How well organized is the presentation? How effectively does this group present, interact and involve the rest of the class? Does this group use time effectively? iv. Addressing questions: How effective does this group deal with questions and com- ments? v. Interest and Creativity: How interesting and creative is this group presentation?

Dataset:

The file ”houseprice.csv” contains house sale prices for King County, which includes Seat- tle. It includes homes sold between May 2014 and May 2015. Besides the house prices, the dataset also provides the details of the houses which are helpful for determining the house price. Use this dataset to build a regression model to predict the house price.

Main variables are:

price: price of the houses

floors: number of floors

condition: rating from 1 to 5 (from worse to great)

view: rating from 0-4 (from worse to great)

sqft above: area of the house

sqft living: living area (includes land around the house)

sqft basement: area of the basement.

bedrooms: number of bedrooms

Tasks:

Part 1. Data Preprocessing

Import data:: houseprice.csv
Data cleaning: NA (remove all observations containing ”NA”, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of the price: mean, median,... Any insights?

Part 3. Models and Analyzing data

Build a linear regression model.to evaluate factors on the price of the house.
With the details of a chosen house: predict the price.
Any interesting insights based on the data? (choose our own methods)

Dataset:

The data set ”Diet.csv” contains information on 78 people who undertook one of three diets. There is background information such as age, gender (Female=0, Male=1) and height. The aim of the study was to see which diet was best for losing weight but it was also thought that the best diets for males and females may be different so the independent variables are diet and gender.

Main variables are:

Person: index of the participant

gender:

Age:

Height:

pre:weight: weight before the diet

Diet: type of diets (1,2 or 3)

weight6weeks: weight after 6 weeks on the chosen diet

Tasks:

Part 1. Data Preprocessing

Import data:: Diet.csv
Data cleaning: NA (remove all observations containing ”NA”, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of the variables: mean, median,... Any insights?

Part 3. Models and Analyzing data

Use one factor ANOVA to see which diet was best for losing weight.
You may divide the whole dataset into two sub-dataset: one for male and one for female to see if we have difference choices.
Any interesting insights based on the data? (choose our own methods)

Dataset:

The dataset ”flights.csv” contains information about all flights that departed from the two major airports of the Pacific Northwest (PNW), SEA in Seattle and PDX in Portland, in 2014: 162,049 flights in total. The main goal of the project is to use this dataset and try to find out the major factors cause the delay or postpone of the flights.

Main variables are:

year, month, day: Date of departure

carrier: Two letter carrier abbreviation. See airlines to get name.

origin, dest: Origin and destination. See airports for additional metadata

dep delay, arr delay: Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.

dep time, arr time: Actual departure and arrival times (format HHMM or HMM), local tz.

distance: Distance between airports, in miles.

Tasks:

Part 1. Data Preprocessing

Import data:: flights.csv
Data cleaning: NA (remove all observations containing ”NA”, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of the arr delay: mean, median,... Any insights?

Part 3. Models and Analyzing data

Use one factor ANOVA to evaluate the differences in the delay time between airlines.
Based on your analysis, which carrier(s) tend to delay more than the others?.
Any interesting insights based on the data? (choose our own methods)

Dataset:

The dataset ”insurance.csv” consists of 1338 records of insurance contracts. The aim of this project is to build a model to predict the insurance costs.

Main variables are:

age: age of primary beneficiary

sex: insurance contractor gender( female, male)

bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg/m^2 ) using the ratio of height to weight, ideally 18.5 to 24.

children: Number of children covered by health insurance / Number of dependents

smoker: smoking or non-smoking

region: the beneficiary’s residential area in the US, northeast, southeast, southwest, northwest.

charges: Individual medical costs billed by health insurance

Tasks:

Part 1. Data Preprocessing

Import data:: insurance.csv
Data cleaning: NA (remove all observations containing ”NA” if any, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of the variables: mean, median,... Any insights?

Part 3. Models and Analyzing data

Build a linear regression model.to evaluate factors on the insurance charges.
Give an example of a contractor and then predict the insurance charge.
Any interesting insights based on the data? (choose our own methods)

Dataset:

The dataset ”supermarket sales.csv” is the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. The aim of this project is to inves- tigate the customer’s satisfaction based on the rating in difference branches.

Main variables are:

Invoice id: Computer generated sales slip invoice identification number

Branch: Branch of supercenter (3 branches are available identified by A, B and C).

Customer type: Type of customers, recorded by Members for customers using mem- ber card and Normal for without member card.

Product line: General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel

Unit price: Price of each product in US dollar

Quantity: Number of products purchased by customer

Total: Total price including tax

Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)

Tasks:

Part 1. Data Preprocessing

Import data:: supermarket sales.csv
Data cleaning: NA (remove all observations containing ”NA” if any, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of the variables: mean, median,... Any insights?

Part 3. Models and Analyzing data

Use one factor ANOVA to evaluate the differences in customer’s satisfaction (based on ratings) between 3 branches.
Based on your analysis, which branch tend to higher customer’s satisfaction?.
Any interesting insights based on the data? (choose our own methods)

Dataset:

This dataset “OnlineNewsPopularity.xlsx” summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity).

Main variables are:

n tokens title: Number of words in the title

n tokens content: Number of words in the content

num hrefs: Number of links

num imgs: Number of images

num videos: Number of videos

data channel

weekday

global subjectivity: Text subjectivity

global rate positive words: Rate of positive words in the content

global rate negative words: Rate of negative words in the content

shares: Number of shares (target)

Tasks:

Part 1. Data Preprocessing

Import data:: OnlineNewsPopularity.xlsx
Data cleaning: NA (remove all observations containing ”NA” if any, missing data)

Part 2. Visualization and Descriptive Statistics

Data visualization: choose some suitable plots (boxplot, scatter plot,...) and try to get some basic insights of the data.
Descriptive statistics of number of shares: mean, median,... Any insights?

Part 3. Models and Analyzing data

Build a linear regression model to evaluate factors on the number of shares.
Give an example of a contractor and then predict the number of shares.
Any interesting insights based on the data? (choose our own methods)

References

[1] Douglas C. Montgomery, George C. Runger. Hoboken. Applied Statistics and Probability for Engineers. NJ: Wiley, (2007).

[2] Peter Dalgaard Introductory Statistics with R. Springer, (2008).

[3] Gareth, J., Daniela, W., Trevor, H. and Robert, T. An introduction to statistical learning: with applications in R. Springer, (2013).

Applied Statistics Projects and Data Analysis, Assignments of Sports Law

Related documents

Partial preview of the text

Download Applied Statistics Projects and Data Analysis and more Assignments Sports Law in PDF only on Docsity!

Course: Applied Statistics

Projects

Bui Anh Tuan

June 7, 2023

Overview

Details

Marking criteria and standards

Dataset:

Tasks:

Dataset:

Tasks:

Dataset:

Tasks:

Dataset:

Tasks:

Dataset:

Tasks:

Dataset:

Tasks: