- Lab
- A Cloud Guru
Forum Export: Exporting Data as JSON and CSV Using Python
Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we'll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.
Path Info
Table of Contents
-
Challenge
Create a Virtualenv and Install the `forum` Package
We have the source for our
forum
package downloaded to~/forum
. We're going to create a new Python 3.7 virtualenv and install the package locally. This project utilized Pipenv in development, and we can use that now to create the virtualenv and install the dependencies:$ cd ~/forum $ pipenv install ...
With the virtualenv in place we just need to activate it and then install the forum package itself:
$ pipenv shell ... (forum) $ pip install -e . ...
Now we can use the
forum
library from within Python scripts and the REPL. -
Challenge
Write the Posts Query in the `export_csv.py` Script
The foundation of both of our export scripts will be the same, so let's write that first. We need to do the following:
- Create a SQLAlchemy engine to connect to our database server.
- Create a
Session
class that is bound to our engine. - Create an instance of our
Session
class to make queries to the database.
This is all done using SQLAlchemy itself instead of relying on code in the
forum
package. We're also going to pass the database connection in using aDB_URL
environment variable when we run the script:export_csv.py
import os from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker db_url = os.environ["DB_URL"] engine = create_engine(db_url) Session = sessionmaker(bind=engine) session = Session()
With the SQLAlchemy boilerplate out of the way, we're now ready to use our models. Because we need some information about the
comments
table but want to receive a row for each post, we'll need to create a few different subqueries to calculate the various comment counts. Let's write these subqueries now:export_csv.py
# SQLAlchemy configuration omitted from sqlalchemy.sql import func from forum.models import Post, Comment comments = ( session.query(Comment.post_id, func.count("*").label("comments")) .group_by(Comment.post_id) .subquery() ) negative_comments = ( session.query(Comment.post_id, func.count("*").label("negative_comments")) .filter(Comment.sentiment == "negative") .group_by(Comment.post_id) .subquery() ) positive_comments = ( session.query(Comment.post_id, func.count("*").label("positive_comments")) .filter(Comment.sentiment == "positive") .group_by(Comment.post_id) .subquery() )
We'll use these three subqueries as part of the main query that we'll be using to get the posts.
export_csv.py
# Previous code omitted final_query = ( session.query( Post, comments.c.comments, negative_comments.c.negative_comments, positive_comments.c.positive_comments, ) .outerjoin(comments, Post.id == comments.c.post_id) .outerjoin(negative_comments, Post.id == negative_comments.c.post_id) .outerjoin(positive_comments, Post.id == positive_comments.c.post_id) )
This query is a bit complicated, but we're getting the information that we want out of the database, making only 4 queries regardless of how many posts and comments there are.
This code will act as the foundation for both of our scripts.
-
Challenge
Add the CSV Export to `export_csv.py`
Now that we have our final query, we need to build a
csv.DictWriter
and iterate through the rows that are returned from our query. Let's set up the writer first:export_csv.py
# Other imports omitted import csv # Query code omitted csv_file = open("forum_export.csv", mode="w") fields = ["id", "body", "author_name", "created_on","comments", "positive_comments", "negative_comments"] csv_writer = csv.DictWriter(csv_file, fieldnames=fields) csv_writer.writeheader()
Now we have an open file, a CSV writer, and we've written the field names to a header row. The last thing that we need to do is iterate through the
final_query
and write a CSV row for each row returned from the query usingcsv_writer.writerow
.export_csv.py
# Previous code omitted for post, comments, negative_comments, positive_comments in final_query: csv_writer.writerow({ "id": post.id, "body": post.body, "author_name": post.author_name, "created_on": post.created_on.date(), "comments": comments or 0, "positive_comments": positive_comments or 0, "negative_comments": negative_comments or 0 }) csv_file.close()
There are a few modifications that we need to make to the data returned from our query so that it can be written to CSV:
- The timestamp for
created_on
needs to be turned into a date. - For each comment value, we need to add a default
0
if there were none returned for the particular post.
The last thing that we need to do is run this:
(forum) $ export DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_csv.py
Now we should have properly formatted rows within
forum_export.csv
. - The timestamp for
-
Challenge
Create the `export_json.py` Script
For the JSON export, we'll use much of the same code as
export_csv.py
, except that we'll rely on thejson
module instead of thecsv
module to write out our results.Let's create
export_json.py
. We can do this by making a copy ofexport_csv.py
and removing the CSV-related logic:(forum) $ cp export_csv.py export_json.py
export_json.py
import os from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker db_url = os.environ["DB_URL"] engine = create_engine(db_url) Session = sessionmaker(bind=engine) session = Session() from sqlalchemy.sql import func from forum.models import Post, Comment comments = ( session.query(Comment.post_id, func.count("*").label("comments")) .group_by(Comment.post_id) .subquery() ) negative_comments = ( session.query(Comment.post_id, func.count("*").label("negative_comments")) .filter(Comment.sentiment == "negative") .group_by(Comment.post_id) .subquery() ) positive_comments = ( session.query(Comment.post_id, func.count("*").label("positive_comments")) .filter(Comment.sentiment == "positive") .group_by(Comment.post_id) .subquery() ) final_query = ( session.query( Post, comments.c.comments, negative_comments.c.negative_comments, positive_comments.c.positive_comments, ) .outerjoin(comments, Post.id == comments.c.post_id) .outerjoin(negative_comments, Post.id == negative_comments.c.post_id) .outerjoin(positive_comments, Post.id == positive_comments.c.post_id) ) for post, comments, negative_comments, positive_comments in final_query: { "id": post.id, "body": post.body, "author_name": post.author_name, "created_on": post.created_on.date(), "comments": comments or 0, "positive_comments": positive_comments or 0, "negative_comments": negative_comments or 0, }
Since we want the JSON objects to have the same information as the rows in CSV, we've kept the dictionary within our loop.
To finish this script we need to:
- Create an empty list to hold the
items
. - Add the dictionary to the
items
list. - Dump the
items
list as JSON toforum_export.json
using thejson.dump
function.
Let's add this now:
export_json.py
# previous code omitted import json items = [] for post, comments, negative_comments, positive_comments in final_query: items.append( { "id": post.id, "body": post.body, "author_name": post.author_name, "created_on": str(post.created_on.date()), "comments": comments or 0, "positive_comments": positive_comments or 0, "negative_comments": negative_comments or 0, } ) with open("forum_export.json", mode="w") as f: json.dump(items, f)
Now we can export the JSON using a command similar to the one we used for the CSV export:
(forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_json.py
We've successfully written scripts for all of the exporting that was requested of us.
- Create an empty list to hold the
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.