How to track, analyze, and visualize user group data using AWS
See how to use AWS Location Service, Amazon EventBridge, Amazon QuickSight, and AWS SAM to track, analyze, and visualize data in this project. Read on!
Jun 08, 2023 • 7 Minute Read
In this post, we discuss how to geocode addresses using Amazon Location Service, how to use Amazon EventBridge and AWS Lambda to transform and load data daily to S3, how to visualize and share the data stored using Amazon QuickSight, and how to automate and orchestrate the entire solution using SAM
Using AWS to understand trends in user group data
The COVID-19 pandemic has shifted many in-person community gatherings to virtual events. New research published in the Journal of Applied Psychology indicates that "Zoom fatigue" is real in the age of virtual meetings, meetups, and social activities.
The community always asks about approaches to support user groups. To that end, I was looking for a fast, effective, and reliable way to analyze user group data and visualize that into a dashboard. This solution helped gain insights in multiple ways:
- Tracking active meetups: Visibility into which user groups have been active in the past 12, 6 or 3 months.
- Visualizing where members are: Using a map to see where our groups are located, and how big their footprint is.
- Keeping up with events: With a table ordered by last event timestamp to see which groups recently have had an event
Accelerate your career
Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.
Overview of the solution
This post describes how I built the solution, from geocoding address data to the final visualization on Amazon QuickSight.
In this solution, I started by developing Python code to scrape user group data from Meetup.
In order to plot each meetup on a map, geolocation data was needed. I used Amazon Location Service to geocode the address into a longitude and latitude coordinate.
The transformed data is then published to an Amazon Simple Storage Service (Amazon S3) bucket.
I used Amazon EventBridge to set up a daily job to trigger a lambda function to collect the user group data. The reporting and visualization layer is built using QuickSight. Finally, the entire pipeline is deployed by using AWS SAM.
The following diagram illustrates this architecture.
Collecting user group data
AWS user groups are communities that meet regularly to share ideas, answer questions, and learn about new services and best practices. The user groups use meetup.com to organize their events. I am curious about the groups in Canada and the U.S listed on the User Groups in the Americas page.
I used BeautifulSoup
and the requests library to scrape the content from the AWS User Group website.
The script first gets the meetup URL for each user group through the get_user_group_data
function. Based on the presence of certain div attributes, it stores the relevant meetup URL and name in a list to be scrapped.
Next, the get_meetup_info
function iterates through the list and parses the information on each individual meetup page such as number of members, and meetup location. The raw data is saved as a CSV for further processing.
The solution in this post is for demonstration purposes only. We recommend running similar scripts only on your own websites after consulting with the team who manages them, or be sure to follow the terms of service for the website that you’re trying to scrape.
The following shows a sample of the script.
meetup_json = {}
page = requests.get(meetup_url)
usergroup_html = page.text
soup = BeautifulSoup(usergroup_html, "html.parser")
# Get Meetup Name
meetup_name = soup.findAll("a", {"class": "groupHomeHeader-groupNameLink"})[0].text
# Meetup location
meetup_location = soup.findAll("a", {"class": "groupHomeHeaderInfo-cityLink"})[
0
].text
# Number of members
meetup_members = (
soup.findAll("a", {"class": "groupHomeHeaderInfo-memberLink"})[0]
.text.split(" ")[0]
.replace(",", "")
)
# Past events
past_events = (
soup.findAll("h3", {"class": "text--sectionTitle text--bold padding--bottom"})[
0
]
.text.split("Past events ")[1]
.replace("(", "")
.replace(")", "")
)
Geocoding user groups
In order to plot each meetup group on a map, we need the longitude and latitude for each city in the meetup group. I was able to use Amazon Location Service to geocode each city name into longitude and latitude coordinates using a place index. For more information about creating a place index, see Amazon Location Service Developer Guide.
Here is an example Python code of using a place index for geocoding.
import boto3
def get_location_data(location: str):
"""
Purpose:
get location data from name
Args:
location - name of location
Returns:
lat, lng - latitude and longitude of location
"""
client = boto3.client("location")
response = client.search_place_index_for_text(
IndexName="my_place_index", Text=location
)
print(response)
geo_data = response["Results"][0]["Place"]["Geometry"]["Point"]
# Example output for Arlington, VA: 'Results': [{'Place': {'Country': 'USA', 'Geometry': {'Point': [-77.08628999999996, 38.89050000000003]}, 'Label': 'Arlington, VA, USA', 'Municipality': 'Arlington', 'Region': 'Virginia', 'SubRegion': 'Arlington County'}}
lat = geo_data[1]
lng = geo_data[0]
print(f"{lat},{lng}")
return lat, lng
Using SAM to orchestrate deployment
After testing the script locally, the next step was to create a mechanism to run the script daily and store the results in S3. I used the AWS Serverless Application Model (SAM) to create a serverless application that does the following.
- Create an S3 bucket
- Create a CloudWatch event to trigger every 24 hours
- Deploy a Python lambda function to run the data scraping code
Here is the outline used to deploy the serverless application highlighting sample code I used.
1. From a terminal window, initialize a new applicationsam init
2. Change directory:cd ./sam-meetup
3. Update dependencies
* update my_app/requirements.txt
requests
pandas
bs4
4. Update the code
Add in your code to example `my_app/app.py`
import json
import logging
import get_meetup_data
def lambda_handler(event, context):
logging.info("Getting meetup data")
try:
get_meetup_data.main()
except Exception as error:
logging.error(error)
raise error
return {
"statusCode": 200,
"body": json.dumps(
{
"message": "meetup data collected",
}
),
}
5. Update template.yml
Globals:
Function:
Timeout: 600
Resources:
S3Bucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketName: MY_BUCKET_NAME
GetMeetupDataFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: my_app/
Handler: app.lambda_handler
Policies:
- S3WritePolicy:
BucketName: MY_BUCKET_NAME
Runtime: python3.9
Architectures:
- x86_64
Events:
GetMeetupData:
Type: Schedule
Properties:
Schedule: 'rate(24 hours)'
Name: MeetupData
Description: getMeetupData
Enabled: True
6. Run `sam build
`
7. Deploy the application to AWSsam deploy --guided
For more detailed information on developing SAM applications, check out Getting started with AWS SAM.
Automating AWS Cost Optimization
AWS provides unprecedented value to your business, but using it cost-effectively can be a challenge. In this free, on-demand webinar, you'll get an overview of AWS cost-optimization tools and strategies.
Visualizing data with QuickSight
To share the user group data, I chose to use QuickSight using Amazon S3 as the data source.
QuickSight is a native AWS service that seamlessly integrates with other AWS services such as Amazon Redshift, Athena, Amazon S3, and many other data sources.
As a fully managed service, QuickSight enabled the team to easily create and publish interactive dashboards. In addition to building powerful visualizations, QuickSight provides data preparation tools that make it easy to filter and transform the data into the exact needed dataset. For more information about creating a dataset, see Creating a Dataset Using Amazon S3 Files.
The following are example screenshots from the dashboard.
Get a crash course on Amazon QuickSight and how to put eyes on your data with this AWS BI tool.
Conclusion
In this post, we discussed how to successfully achieve the following:
- Geocode addresses using Amazon Location Service
- Use Amazon EventBridge and AWS Lambda to transform and load the data daily to S3
- Visualize and share the data stored using Amazon QuickSight
- Automate and orchestrate the entire solution using SAM
This solution can be used to gain insights into engaging with technical communities. If you’re interested in participating in your local community, check out the AWS user group page here.
About the Author
Banjo is a Senior Developer Advocate at AWS, where he helps builders get excited about using AWS. Banjo is passionate about operationalizing data and has started a podcast, a meetup, and open-source projects around utilizing data. When not building the next big thing, Banjo likes to relax by playing video games especially JRPGs and exploring events happening around him.