Skip to content

Paginating AWS API Results using the Boto3 Python SDK

Boto3 is Amazon’s officially supported AWS SDK for Python.
It’s the de facto way to interact with AWS via Python.
If you’ve used Boto3 to query AWS resources, you may have run into limits on how many
resources a query to the specified AWS API will return (generally 50 or 100 results),
although S3 will return up to 1000 results. The AWS APIs return “pages” of results. If you are trying to retrieve more than one “page” of results you will need to use a paginator to issue multiple API requests on your behalf.

Introduction

Boto3 provides Paginators to
automatically issue multiple API requests to retrieve all the results
(e.g. on an API call to EC2.DescribeInstances). Paginators are straightforward to use,
but not all Boto3 services provide paginator support. For those services you’ll need to
write your own paginator in Python.
In this post, I’ll show you how to retrieve all query results for Boto3 services
which provide Pagination support, and I’ll show you how to write a custom paginator
for services which don’t provide built-in pagination support.

Built-in Paginators

Most services in the Boto3 SDK provide Paginators. See S3 Paginators for example.
Once you determine you need to paginate your results, you’ll need to call the get_paginator() method.

How do I know I need a Paginator?

If you suspect you aren’t getting all the results from your Boto3 API call, there are a couple of ways to check.
You can look in the AWS console (e.g. number of Running Instances), or run a query via the aws command-line interface.
Here’s an example of querying an S3 bucket via the AWS command-line. Boto3 will return the first 1000 S3 objects from the bucket, but since there are a total of 1002 objects, you’ll need to paginate.

Counting results using the AWS CLI

$ aws s3 ls my-example-bucket|wc -l
-> 1002

Here’s a boto3 example which, by default, will return the first 1000 objects from a given S3 bucket.

Determining if the results are truncated

import boto3
# use default profile
s3 = boto3.client('s3')
resp = s3.list_objects_v2(Bucket='my-example-bucket')
print('list_objects_v2 returned {}/{} files.'.format(resp['KeyCount'], resp['MaxKeys']))
if resp['IsTruncated']:
print('There are more files available. You will need to paginate the results.')
>>> "list_objects_v2 returned 1000/1000 files."
>>> "There are more files available. You will need to paginate the results."

The S3 response dictionary provides some helpful properties, like IsTruncated, KeyCount, and MaxKeys which tell you if the results were truncated. If resp['IsTruncated'] is True, you know you’ll need to use a Paginator to return all the results.

Using Boto3’s Built-in Paginators

The Boto3 documentation provides a good overview of how to use the built-in paginators, so I won’t repeat it here.
If a given service has Paginators built-in, they are documented in the Paginators section of the service docs, e.g. AutoScaling, and EC2.

Determine if a method can be paginated

You can also verify if the boto3 service provides Paginators via the client.can_paginate() method.

import boto3
s3 = boto3.client('s3')
print(s3.can_paginate('list_objects_v2')) # => True

So, that’s it for built-in paginators. In this section I showed you how to determine
if your API results are being truncated, pointed you to Boto3’s excellent documentation
on Paginators
, and
showed you how to use the can_paginate() method to verify if a given service
method supports pagination.
If the Boto3 service you are using provides paginators, you should use them.
They are tested and well documented. In the next section, I’ll show you how to
write your own paginator.

How to Write Your Own Paginator

Some Boto3 services, such as AWS Config
don’t provide paginators. For these services, you will have to write your own
paginator code in Python to retrieve all the query results. In this section, I’ll show you how to write your own paginator.

You Might Need To Write Your Own Paginator If…

Some Boto3 SDK services aren’t as built-out as S3 or EC2. For example, the AWS Config service doesn’t provide paginators. The first clue is that the Boto3 AWS ConfigService docs don’t have a “Paginators” section.

The can_paginate Method

You can also ask the individual service client’s can_paginate method if it supports paginating. For example, here’s how to do that for the AWS config client. In the example below, we determine that the config service doesn’t support paginating for the get_compliance_details_by_config_rule method.

import boto3
config = boto3.client('config')
can_paginate = config.can_paginate('get_compliance_details_by_config_rule')
if not can_paginate:
print('There is no built-in paginator for that method')
>>> 'There is no built-in paginator for that method'

Operation Not Pageable Error

If you try to paginate a method without a built-in paginator, you will get an
error similar to this:

config.get_paginator('get_compliance_details_by_config_rule')
.../python2.7/site-packages/botocore/client.pyc in get_paginator(self, operation_name)
591         if not self.can_paginate(operation_name):
--> 592             raise OperationNotPageableError(operation_name=operation_name)
593         else:
594             actual_operation_name = self._PY_TO_OP_NAME[operation_name]
"OperationNotPageableError: Operation cannot be paginated: get_compliance_details_by_config_rule"

If you get an error like this, it’s time to roll up your sleeves and write your
own paginator.

Writing a Paginator

Writing a paginator is fairly straightforward. When you call the AWS service API,
it will return the maximum number of results, and a long hex string token, next_token
if there are more results.

Approach

To create a paginator for this, you make calls to the service API in a loop until next_token
is empty, collecting the results from each loop iteration in a list. At the end of the loop, you will have all the results in the list.
In the example code below, I’m calling the
AWS Config
service to get a list of resources (e.g. EC2 instances), which are not compliant with the required-tags Config rule.
As you read the example code below, it might help to read the Boto3 SDK docs for the
get_compliance_details_by_config_rule method,
especially the “Response Syntax” section.

Example Paginator

import boto3
def get_resources_from(compliance_details):
results = compliance_details['EvaluationResults']
resources = [result['EvaluationResultIdentifier']['EvaluationResultQualifier'] for result in results]
next_token = compliance_details.get('NextToken', None)
return resources, next_token
def main():
config = boto3.client('config')
next_token = ''  # variable to hold the pagination token
resources = []   # list for the entire resource collection
# Call the `get_compliance_details_by_config_rule` method in a loop
# until we have all the results from the AWS Config service API.
while next_token is not None:
compliance_details = config.get_compliance_details_by_config_rule(
ConfigRuleName='required-tags',
ComplianceTypes=['NON_COMPLIANT'],
Limit=100,
NextToken=next_token
)
current_batch, next_token = get_resources_from(compliance_details)
resources += current_batch
print(resources)
if __name__ == "__main__":
main()

Example Paginator – main() Method

In the example above, the main() method creates the config client and initializes
the next_token variable. The resources list will hold the final results set.
The while loop is the heart of the paginating code. In each loop iteration, we call the
get_compliance_details_by_config_rule method, passing next_token as a parameter.
Again, next_token is a long hex string returned by the given AWS service API method.
It’s our “claim check” for the next set of results.
Next, we extract the current_batch of AWS resources and the next_token string
from the compliance_details dictionary returned by our API call.

Example Paginator – get_resources_from() Helper Method

The get_resources_from(compliance_details) is an extracted helper method for parsing
the compliance_details dictionary. It returns our current batch (100 results)
of resources and our next_token “claim check” so we can get the next page of results
from config.get_compliance_details_by_config_rule().
I hope the example is helpful in writing your own custom paginator.


In this section on writing your own paginators I showed you a Boto3 documentation
example of a service without built-in paginator support. I discussed the can_paginate
method and showed you the error you get if you call it on a method which doesn’t
support pagination. Finally, I discussed an approach for writing a custom paginator
in Python and showed a concrete example of a custom paginator which passes the NextToken
“claim check” string to fetch the next page of results.

Summary

In this post, I covered Paginating AWS API responses with the Boto3 SDK.
Like most APIs (Twitter, GitHub, Atlassian, etc)
AWS paginates API responses over a set limit, generally 50 or 100 resources.
Knowing how to paginate results is crucial when dealing with large AWS accounts which
may contain thousands of resources.
I hope this post has taught you a bit about paginators and how to get all
your results from the AWS APIs.

About the Author

Doug is a Sr. DevOps engineer at 1Strategy, an AWS Consulting Partner specializing in Amazon Web Services (AWS). He has 23 years experience in IT, working at Microsoft, Washington Mutual Bank, and Nordstrom in diverse roles from testing, Windows Server engineer, developer, and Chef engineer, helping app and platform teams manage thousands of servers via automation.

Categories

Categories