- Lab
- A Cloud Guru
Generate a Complete Report
In this lab, graphs are created from data sliced from Titanic survivability CSV files. The PDF of the notebook for this lab is [here.](https://github.com/linuxacademy/content-python-for-database-and-reporting/blob/master/pdf/hol_5_1_l_solution.pdf)
Path Info
Table of Contents
-
Challenge
Start Jupyter Notebook Server and Access on Your Local Machine
Connecting to the Jupyter Notebook Server
Make sure that you have activated the virtual environment!
- To activate the virtual environment:
conda activate base
- To start the server run the following:
python get_notebook_token.py
This is a simple script that starts the Jupyter notebook server and sets it to continue to run outside of the terminal.
Note: On the terminal is a token, please copy this and save it to a text file on your local machine.
On Your Local Machine
- In a terminal window, enter the following:
ssh -N -L localhost:8087:localhost:8086 cloud_user@<the public IP address of the Playground server>
It will ask you for your password; this is the password you used to log in to the Playground remote server. Leave this terminal open. It will appear nothing has happened, but it must remain open while you use the Jupyter Notebook server in this session.
- In the browser of your choice, enter the following address:
http://localhost:8087
This will open a Jupyter Notebook site that asks for the token you copied from the remote server.
-
Challenge
Import Required Packages and Create Dataframe From File
Titanic Data: Factors Affecting Survivability
This data was collected from a web search. It is available from many different organizations. The data provides specific data about passengers on the Titanic and whether they survived the disaster or not.
The various data available is defined as:
- PassengerId - Indexed starting at 1
- Survived - Survival (0 = No; 1 = Yes)
- Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
- Name - Name
- Sex - Sex
- Age - Age
- SibSp - Number of Siblings/Spouses Aboard
- Parch - Number of Parents/Children Aboard
- Ticket - Ticket Number
- Fare - Passenger Fare
- Cabin - Cabin
- Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
The questions we are asking:
- What part did age play?
- What part did gender play?
- Did the passenger class make a difference?
Load the CSV Data Into a Dataframe
import matplotlib.pyplot as plt import pandas as pd %matplotlib inline titanic_df = pd.read_csv('titanic.csv') titanic_df.head()
-
Challenge
Examine the Effect Age Had on Survivability
Examine The Effect of Age on Survivability
- Under 12
- 13 - 24
- 25 - 49
- 50 - 74
- 75 and Older
#### Under 12 passengers_under_12 = titanic_df[titanic_df.Age < 12] passengers_under_12_survived = passengers_under_12[passengers_under_12.Survived == 1] passengers_under_12_percent_survived = passengers_under_12_survived.Age.count() / passengers_under_12.Age.count() # Under 13 - 24 passengers_13_to_24 = titanic_df[(titanic_df.Age >= 13) & (titanic_df.Age < 25)] passengers_13_to_24_survived = passengers_13_to_24[passengers_13_to_24.Survived == 1] passengers_13_to_24_percent_survived = passengers_13_to_24_survived.Age.count() / passengers_13_to_24.Age.count() # 25 to 49 passengers_25_to_49 = titanic_df[(titanic_df.Age >= 25) & (titanic_df.Age < 50)] passengers_25_to_49_survived = passengers_25_to_49[passengers_25_to_49.Survived == 1] passengers_25_to_49_percent_survived = passengers_25_to_49_survived.Age.count() / passengers_25_to_49.Age.count() # 50 to 74 passengers_50_to_74 = titanic_df[(titanic_df.Age >= 50) & (titanic_df.Age < 74)] passengers_50_to_74_survived = passengers_50_to_74[passengers_50_to_74.Survived == 1] passengers_50_to_74_percent_survived = passengers_50_to_74_survived.Age.count() / passengers_50_to_74.Age.count() # 75 and over passengers_75_over = titanic_df[titanic_df.Age > 74] passengers_75_over_survived = passengers_75_over[passengers_75_over.Survived == 1] passengers_75_over_percent_survived = passengers_75_over_survived.Age.count() / passengers_75_over.Age.count() print(f'Under 12:\t{passengers_under_12.Age.count()} - {passengers_under_12_percent_survived}') print(f'13 - 24:\t{passengers_13_to_24.Age.count()} - {passengers_13_to_24_percent_survived}') print(f'25 - 49:\t{passengers_25_to_49.Age.count()} - {passengers_25_to_49_percent_survived}') print(f'50 - 74:\t{passengers_50_to_74.Age.count()} - {passengers_50_to_74_percent_survived}') print(f'75 & Over:\t{passengers_75_over.Age.count()} - {passengers_75_over_percent_survived}')
# Show data as a bar chart groups = ('Under 12', '13 - 24', '25 - 49', '50 - 74', '75 & Over') percentages = [0.57, 0.37, 0.41, 0.36, 1] plt.bar(groups, percentages, align='center', alpha=0.5) plt.ylabel("Percent Survived") plt.title("Titanic Survivablity by Age Group")
This suggests that children under 13 may have been given some preferential treatment for lifeboats. However, it is not clear if survivability is only those that died in the event. It may be that some of the children may have been more susceptible to environmental factors, such as temperature, and died in the lifeboat.
Since there was only one passenger in the 75 & Over group, the survivability of that group is not useful and should not be considered.
-
Challenge
Examine the Effect Gender Had on Survivability
Examine the Effect of Gender on Survivability
#### Male passengers_male = titanic_df[titanic_df.Sex == "male"] passengers_male_survived = passengers_male[passengers_male.Survived == 1] passengers_male_percent_survived = passengers_male_survived.Sex.count() / passengers_male.Sex.count() #### Female passengers_female = titanic_df[titanic_df.Sex == "female"] passengers_female_survived = passengers_female[passengers_female.Survived == 1] passengers_female_percent_survived = passengers_female_survived.Sex.count() / passengers_female.Sex.count() print(f'Male:\t{passengers_male.Sex.count()} - {passengers_male_percent_survived}') print(f'Female:\t{passengers_female.Sex.count()} - {passengers_female_percent_survived}')
# Show data as a bar chart groups = ('Male', 'Female') percentages = [0.18, 0.74] plt.bar(groups, percentages, align='center', alpha=0.5) plt.ylabel("Percent Survived") plt.title("Titanic Survivablity by Gender")
It is obvious female passengers were given preference over male passengers for lifeboats. It would be interesting to break down the male survivors by age group. Hypothesis: Younger males survived at a higher rate.
-
Challenge
Examine the Effect Passenger Class Had on Survivability
Examine the Effect of Passenger Class on Survivability
#### Passenger Class 1 passengers_class_1 = titanic_df[titanic_df.Pclass == 1] passengers_class_1_survived = passengers_class_1[passengers_class_1.Survived == 1] passengers_class_1_percent_survived = passengers_class_1_survived.Pclass.count() / passengers_class_1.Pclass.count() #### Passenger Class 2 passengers_class_2 = titanic_df[titanic_df.Pclass == 2] passengers_class_2_survived = passengers_class_2[passengers_class_2.Survived == 1] passengers_class_2_percent_survived = passengers_class_2_survived.Pclass.count() / passengers_class_2.Pclass.count() #### Passenger Class 3 passengers_class_3 = titanic_df[titanic_df.Pclass == 3] passengers_class_3_survived = passengers_class_3[passengers_class_3.Survived == 1] passengers_class_3_percent_survived = passengers_class_3_survived.Pclass.count() / passengers_class_3.Pclass.count() print(f'Class 1:\t{passengers_class_1.Pclass.count()} - {passengers_class_1_percent_survived}') print(f'Class 2:\t{passengers_class_2.Pclass.count()} - {passengers_class_2_percent_survived}') print(f'Class 3:\t{passengers_class_3.Pclass.count()} - {passengers_class_3_percent_survived}')
# Show data as a bar chart groups = ('Class 1', 'Class 2', 'Class 3') percentages = [0.63, 0.47, 0.24] plt.bar(groups, percentages, align='center', alpha=0.5) plt.ylabel("Percent Survived") plt.title("Titanic Survivablity by Passenger Class")
It is clear that Class 1 passengers were more likely to be saved, whether they were closer to the lifeboats or a genuine preference cannot be determined. Once again, looking at this data by age and gender would be interesting for further study.
This is not an exhaustive review of the data available, but a simple review based on three independent attributes. Much more data could be analyzed for deeper, more specific ideas of how the surviving passengers were selected.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.