Counting In DynamoDB
DatabasesDoes getting a count in DynamoDB return the total across the entire table, or just the total for the current page?
While a brief search online didn't give any conclusive result, it
did show that it's possible to get a count for a scan or a query
without fetching the items themselves, which is a good thing (honestly,
I was not expecting this to be possible). This is done by setting the
select
parameter to count
:
out, err := client.Scan(&dynamodb.ScanInput{ TableName: aws.String("my-table"), ExpressionAttributeNames: expr.Names(), ExpressionAttributeValues: expr.Values(), FilterExpression: expr.Filter(), Select: types.SelectCount, }) if err != nil { return 0, err } return out.Count, nil
But the question remains: would this count only include those items that would be returned in the page of results?
So I set about testing this. I created a DynamoDB test instance running
in Docker and wrote a Go program which added 5,000 records simulating
coin tosses. Each item has a unique partition and sort key, plus a
boolean field indicating if the toss came up heads. It then performed a
single Scan over the table, filtering for and counting the records that
came up heads, and displayed the results to the output. It also printed
out whether the LastEvaluatedKey
was set, just to confirm that there
were no more pages of results.
I chose 5,000 items as that seemed like a dataset that was larger than a page if I were to fetch the items themselves. So I was happy to see the results come out at around 50% of the total result set:
2024/04/30 09:38:26 creating items 2024/04/30 09:38:58 created items 2024/04/30 09:38:59 count = 2422, next = map[]
So the answer is "yes" then?
Well no, unfortunately. Perusing the API docs for Scan a little closer revealed this little tidbit:
Also, if the processed dataset size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation
To confirmed whether this applied to counts, I modified the program to add 2 KB of "junk" to each item and ran the test again. Here are the results:
2024/04/30 09:41:44 creating items 2024/04/30 09:42:19 created items 2024/04/30 09:42:19 count = 259, next = map[pk:0x14000518810 sk:0x140005187f8]
Ah, it's what I feared. Counts are bound to the 1 MB limit returned by DynamoDB.
So I modified the logic to page through the results until the last evaluated key returned nil. And running this got me back to the ~50% I was expecting:
2024/04/30 09:49:13 creating items 2024/04/30 09:49:46 created items 2024/04/30 09:49:46 count = 2512
So, just like getting all the items of a scan, the only surefire way to
get an accurate count is to iterate over all the returned pages until
the last evaluated key is nil
:
var ( count int lastEvalKey map[string]types.AttributeValue ) for { out, err := client.Scan(&dynamodb.ScanInput{ TableName: aws.String("my-table"), ExpressionAttributeNames: expr.Names(), ExpressionAttributeValues: expr.Values(), FilterExpression: expr.Filter(), Select: types.SelectCount, }) if err != nil { return 0, err } count += int(out.Count) if out.LastEvaluatedKey == nil { break } lastEvalKey = out.LastEvaluatedKey } return count, nil
Note: I didn't test queries, so I'm only assuming that this also works for those as well.