Skip to content

Homework 4-F22

Background

Homework 4.

Please turn into Github your Python notebook by September 30th, end of day to a new Github repository called Homework4.  The page should have a full notebook as a Readme. Hint, export as a readme.md, unzip, and put the images & the readme.md into Github.

Practice Exercise To Help With Homework below

First, we will go through a practice exercise is to get us used to using APIs and joins.  In this exercise we explore whether there is obvious evidence for a correlation b/t number of hospital beds and COVID mortality. Intuitively, one might think that more hospital beds improve mortality within a region. Here, we want to complete a very quick cursory view of this premise.

For this exercise we will start by using the following API. The API Docs are here: https://covid-19-apis.postman.com/).  Here is an example unit test where you can see a JSON result. https://api.covid19api.com/

How do we determine countries? Well we read the API manual and see: https://api.covid19api.com/countries

Homework Instructions
Your homework will look something like this with some deviations.  

Documentation: https://documenter.getpostman.com/view/8854915/SzS8rjHv?version=latest 

Homework: Dataset COVID

Our goal is to correlate beds per 10,000 populations to mortality for COVID. Lets bring in another data set that tells us about the number of beds per sample. This table has one-line per hospital. We will need to aggregate or sum by county in order to get the total number of beds per county. First the API.

Loading from API

Call API: https://www.communitybenefitinsight.org/api/get_hospitals.php

API Documents: https://www.communitybenefitinsight.org/?page=info.data_api

hospitals=pd.DataFrame(req)
hospitals.head()

CREATE HospitalInfoPerCounty Table

The field we want is hospital_bed_count summed for each county. One thing to realize is that county is not unique in the US. There are more than one Jefferson Counties. So lets create a new column called `state_county`

hospitals['state_county']=hospitals['state']+'_'+hospitals['county']
hospitals.head()

HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()

That’s not right!

First, lets see the types.

Ok – there clearly aren’t 49622 in AK_Kenai Peninsula Borough. Lets inspect further.

hospitals[hospitals['state_county']=="AK_Kenai Peninsula Borough"]['hospital_bed_size']

*added 9/28*

We can see the issue

40    <100 beds
42    <100 beds
47    <100 beds
Name: hospital_bed_size, dtype: object

Basically <100 bed is not a number but a string.

 

Turning a string into a number in hospitals.hospital_bed_count

**NEED COMMAND HERE TO CONVERT `hospitals.hospital_bed_count`**

CREATE HospitalInfoPerCounty and summing for hospital_bed_count

We need to create a dataframe that is unique by state_county, and we need to sum across entries on hospital_bed_count.  This is because the state and county may not be unique (e.g. multiple entries for different days).  Its always important to know what tables are unique for a table.

HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()

Substitution and curating

Sometimes data is not fully structured, as is case here.. This is really sub-optimal, but there are times when you want an approximate solution to understand the scope of the problem. In this case, lets just focus on those that end in County and remove that text, noting we won’t get all matches, but we should get a majority.

HospitalInfoPerCounty['state_county']=HospitalInfoPerCounty['state_county'].str.replace(' County','')
HospitalInfoPerCounty.head(5)

CREATE CovidPerCounty DATAFRAME

We need to work on covid, and curate it for use in order to compare with HospitalInfoPerCounty

import requests
import pandas as pd
import json
import numpy as np
covid=pd.DataFrame(req)

On your own examine the resulting table, e.g. covid.head().  What do you see?  (Not shown below).  We see that states is not a two letter code – how do we convert it?

Converting states to a two letter code using a dictionary

We are also going to need to convert states to a two letter code. Lets use a dictionary

us_state_to_abbrev = {
"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR", "California": "CA",
"Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA",
"Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS",
"Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA",
"Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT",
"Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM",
"New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH",
"Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI",
"South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX",
"Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA",
"West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY",
"District of Columbia": "DC","American Samoa": "AS","Guam": "GU", "Northern Mariana Islands": "MP",
"Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",
}
covid['state'].replace(us_state_to_abbrev, inplace=True)
covid.head(5)

Creating a key to join our tables

The dataframe covid may not be fully unique per state or county. Lets create a new column that we can merge on which concatenates both.

covid['state_county']=covid['state']+'_'+covid['county']
covid.head()

Creating a dataframe with unique CovidPerCount

We will group and sum for deathspopulation, and confirmed

CovidPerCounty=covid.groupby(['state_county'], as_index=False).agg({'deaths':'sum','confirmed':'sum','population':'sum','confirmed_daily':'mean','deaths_daily':'mean'})
CovidPerCounty.head(15)

Create mortality within CovidPerCounty

CovidPerCounty['mortality']=CovidPerCounty['deaths']/CovidPerCounty['confirmed']
CovidPerCounty.head(5)

JOINING CovidPerCounty and HospitalInfoPerCounty

Next Create An Inner Join HospitalInfoPerCounty and CovidPerCounty by state_county.

 

Hospital_Covid_PerCounty=pd.merge(HospitalInfoPerCounty,CovidPerCounty,on='state_county')
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']>50000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']<300000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['confirmed']>10]
Hospital_Covid_PerCounty.head()

FINAL PLOT:

PLEASE FILL IN THE COMMAND TO GET THE PLOT BELOW

sns.relplot(MISSING PLEASE FILL IN TO MAKE PLOT BELOW)

PEOPLE PER HOSPITAL BED VS. MORTALITY

No Correlation Is Obvious! Surprise – buying more hospital beds is not obviously improving mortality

Turning in:

Please turn into github as Homework4.  Your Github Readme should look like the Notebook for the assignment.  The best way to do this is to “save and export notebook” as a “markdown”.  The image of the graph must be rendered in your markdown.