Importing Data from a JSON Resource with Python
Jan 15, 2019 • 8 Minute Read
Introduction
As mentioned in the first guide of this series, JSON resources (local files or API responses) can be easily manipulated using lists and dictionaries. The Python standard library provides all the utilities that we will need to read files, query API endpoints through HTTP requests, and handle the JSON output.
Analyzing the API Response Format
To illustrate, we will use teams, one of the available endpoints from the NBA Open API as we can see in Fig. 1
The teams API call returns two dictionaries, whose keys are _internal and league. The latter consists of a series of key-value pairs where the keys are league names and the corresponding value is a list of the teams in each league. If we go to the URL using a web browser and expand standard, as seen in Fig. 2, we will recognize the familiar Python dictionary and list format:
Strictly speaking, JSON is made of objects and arrays that are decoded into Python dictionaries and lists, respectively. Similarly, the JSON values true, false, and null are translated into True, False, and None.
Each of the elements in the list shown in Fig. 2 is a dictionary. Particularly, the sixth item has the following key-value pairs:
{'isNBAFranchise': True, 'isAllStar': False, 'city': 'Boston', 'altCityName': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': '1610612738', 'urlName': 'celtics', 'confName': 'East', 'divName': 'Atlantic'}
Now that we know what to expect from the chosen API, we will explain how to query it using Python.
Retrieving Data from the API
Although there are third party tools that can perform the same task (such as the requests module), in this guide we will use a utility that is available out of the box with Python: the urllib.request library.
First, we will import the library and the json module:
import urllib.request as request
import json
Next, we will open the URL with the .urlopen() function, read the response into a variable named source, and then convert it into JSON format using .loads().
with request.urlopen('http://data.nba.net/prod/v2/2018/teams.json') as response:
source = response.read()
data = json.loads(source)
Optionally, you may choose to check if the request was successful before proceeding further. To do so, replace the previous block with the following suite:
with request.urlopen('http://data.nba.net/prod/v2/2018/teams.json') as response:
if response.getcode() == 200:
source = response.read()
data = json.loads(source)
else:
print('An error occurred while attempting to retrieve data from the API.')
Fig. 3 shows the keys in data (_internal and league, as expected). Since the latter is also a dictionary, we can inspect its keys as well:
type(data)
data.keys()
type(data['league'])
data['league'].keys()
Finally, we will dive a bit deeper to examine standard, as shown in Fig. 4. The same image shows the item we highlighted when querying the API through the browser earlier:
type(data['league']['standard'])
len(data['league']['standard'])
data['league']['standard'][5]
To include only teams that are NBA franchises, we can create a new list called nba_teams using list comprehension:
nba_teams = [team for team in data['league']['standard'] if team['isNBAFranchise']]
The json module also provides a method called .dump() to write Python objects into files. For better readability, we can pass indent = 4 and sort_keys = True as arguments to sort the keys alphabetically and indent the output using four spaces in each level. The following two lines will perform this task and save the output into a file called nba_teams.json, whose content is shown, in part, in Fig. 5:
with open('nba_teams.json', 'w') as f:
json.dump(nba_teams, f, indent = 4, sort_keys = True)
So far, we have explored how to query an API, manipulate the response, and save the output to a local file. But what if the JSON objects were provided from a file in the first place? We will now learn how to proceed under that scenario.
Reading JSON from Files
If we need to read a JSON-formatted file and convert its content into Python objects, we will use the .load() method from the json module. As opposed to .loads(), which takes a string as an argument, .load() expects a file object. In short, both functions perform the same task, but they differ in the type of input they handle. Similarly, .dumps() converts Python objects into strings, whereas .dump() saves them to a file.
In its simplest form, we would use the following suite to read nba_teams.json into a variable of type list called data:
with open('nba_teams.json') as f:
data = json.load(f)
However, both .loads() and .load() accept, as an optional argument, a function that can be used to implement a custom decoder. Suppose we want to group teams into divisions (Atlantic, Central, Southeast, Northwest, Pacific, and Southwest) and return a list of the teams in each division.
To begin, we will define a function named group_teams_by_division(). This function will take each team in the file object and decode it into a tuple with the team name and division:
def group_teams_by_division(team):
return (team['fullName'], team['divName'])
Next, we will open the file and pass object_hook = team_by_division as argument.
with open('nba_teams.json') as f:
data = json.load(f, object_hook = group_teams_by_division)
Finally, we will create a dictionary where the keys will be the division names and the corresponding value will be a list of all the teams in that division:
teams_by_division = {}
for i in range(len(data)):
team, division = data[i]
if division not in teams_by_division.keys():
teams_by_division[division] = []
teams_by_division[division].append(team)
Keep in mind that you can print the dictionary or use .dump() to save it to a file named teams_by_division.json as seen in Fig. 6:
with open('teams_by_division.json', 'w') as f:
json.dump(teams_by_division, f, indent = 4, sort_keys = True)
At this point you can potentially use what we learned in Importing Data from Microsoft Excel Files with Python to create spreadsheets for easier visualization.
Summary
In this guide we learned how to manipulate JSON resources with Python, including how to implement a custom decoder.
Feel free to download the files used in this guide from Github.