Guardrails for Amazon Bedrock: Making your AI safe for users
Need to build a responsible AI with safeguards built in, aligned to your policies? Guardrails can do that, including denying topics and filtering harmful content.
Apr 15, 2024 • 4 Minute Read
The running theme for the AI industry over the last year has been “move fast and break things.” While this was great for pushing innovation at first, the first models to hit the market have been ripe with potential for abuse: time and again, researchers have shown how they can be tricked into being used for hate speech, sexual content, or simply giving advice on topics they shouldn’t touch.
Announced at AWS re:Invent 2023, Guardrails for Amazon Bedrock provides an answer to this problem. In this article, we cover what Guardrails is, and how you can use it to create responsible models that provide safe user experiences while still gaining the benefits of AI.
First, what is Amazon Bedrock?
Amazon Bedrock offers a fully managed solution for crafting generative AI applications. It begins with a foundational model (FM), sourced from Amazon or an external AI firm, serving as the core on which you can build. This "bedrock" – a pre-trained base model – allows for extensive customization to align with specific requirements.
For more details about what Amazon Bedrock is and how to start using it, including fine tuning models and using APIs in your application, I’d highly recommend checking out these articles:
“What is Amazon Bedrock?” by Beth Hord
“How to get started with Amazon Bedrock” by Amber Israelsen
What is Guardrails for Amazon Bedrock?
Guardrails is a feature in Amazon Bedrock that allows you to put safeguards in place — known as guardrails — that can check user inputs and AI outputs, and filter or deny topics that are unsafe. You determine what qualifies based on your company policies. These safeguards are foundation model (FM) agnostic.
In Guardrails, there are two main features: Denied Topics and Content Filters. The first is for topics you want to stop people discussing altogether, while the second is for categories of content you want to have a scaling tolerance for. E.g. You might want to allow your users to input mild violent content (such as discussing an action movie) but not highly violent content.
What can Guardrails for Amazon Bedrock deny?
Anything! All you need to do is provide a name for the guardrail, such as “Investment advice”, and then a description. E.g. “Investment advice refers to inquiries, guidance, or recommendations regarding the management or allocation of funds or assets with the goal of generating returns or achieving specific financial objectives.”
You can test to see if your guardrail is working in the user interface by running guardrail trace tests.
What can Guardrails for Amazon Bedrock filter?
Guardrails can filter both user input and your model’s output. There are four categories you can filter by: hate, insults, sexual, and violence. You can set these to a tolerance level of none, low, medium, or high. The filter strength determines the likelihood of filtering this content by the category.
Can I stop users inputting PII with Guardrails for Amazon Bedrock?
Not yet, but you will be able to in the future. Amazon has said it is working on a feature where you will be able to “select a set of personally identifiable information (PII) such as name, e-mail address, and phone number, that can be redacted in FM-generated responses or block a user input if it contains PII.”
Can I monitor and audit user inputs and FM responses with Guardrails?
Yes, if you integrate it with Amazon CloudWatch, you can monitor and analyze user inputs and FM responses that have violated your guardrail policies.
What Large Language Models (LLMs) does Guardrails support?
Every LLM within Amazon Bedrock is supported. This means Amazon Titan Text, Anthropic Claude, Meta Llama 2, AI21 Jurassic, and Cohere Command. You can also use it with custom models, and Agents for Amazon Bedrock.
How do I get access to Guardrails for Amazon Bedrock?
Right now, Guardrails is available in limited preview. That means you’ll need to reach out to AWS Support and ask for access to this feature.
Conclusion: Guardrails is a great addition to Bedrock’s feature set
This was a great feature announcement at AWS re:Invent, and it will be great to see PII filtering for Bedrock in the future. Hopefully Guardrails makes it out of preview, and becomes generally available for everyone.
It was interesting that Guardrails applies these protections over the top of FMs, essentially overriding their outputs, which gives Amazon (and you) more control over the outputs instead of trusting it up to the AI model developers. Logically, that makes sense now that AWS is going for breadth when it comes to FM model offerings, allowing them to provide options but also responsible use.