Coding Bits
April 30th, 2024

Counting In DynamoDB

Databases

Does getting a count in DynamoDB return the total across the entire table, or just the total for the current page?

While a brief search online didn't give any conclusive result, it did show that it's possible to get a count for a scan or a query without fetching the items themselves, which is a good thing (honestly, I was not expecting this to be possible). This is done by setting the select parameter to count:

out, err := client.Scan(&dynamodb.ScanInput{
  TableName:                 aws.String("my-table"),
  ExpressionAttributeNames:  expr.Names(),
  ExpressionAttributeValues: expr.Values(),
  FilterExpression:          expr.Filter(),
  Select:                    types.SelectCount,
})
if err != nil {
  return 0, err
}

return out.Count, nil

But the question remains: would this count only include those items that would be returned in the page of results?

So I set about testing this. I created a DynamoDB test instance running in Docker and wrote a Go program which added 5,000 records simulating coin tosses. Each item has a unique partition and sort key, plus a boolean field indicating if the toss came up heads. It then performed a single Scan over the table, filtering for and counting the records that came up heads, and displayed the results to the output. It also printed out whether the LastEvaluatedKey was set, just to confirm that there were no more pages of results.

I chose 5,000 items as that seemed like a dataset that was larger than a page if I were to fetch the items themselves. So I was happy to see the results come out at around 50% of the total result set:

2024/04/30 09:38:26 creating items
2024/04/30 09:38:58 created items
2024/04/30 09:38:59 count = 2422, next = map[]

So the answer is "yes" then?

Well no, unfortunately. Perusing the API docs for Scan a little closer revealed this little tidbit:

Also, if the processed dataset size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation

To confirmed whether this applied to counts, I modified the program to add 2 KB of "junk" to each item and ran the test again. Here are the results:

2024/04/30 09:41:44 creating items
2024/04/30 09:42:19 created items
2024/04/30 09:42:19 count = 259, next = map[pk:0x14000518810 sk:0x140005187f8]

Ah, it's what I feared. Counts are bound to the 1 MB limit returned by DynamoDB.

So I modified the logic to page through the results until the last evaluated key returned nil. And running this got me back to the ~50% I was expecting:

2024/04/30 09:49:13 creating items
2024/04/30 09:49:46 created items
2024/04/30 09:49:46 count = 2512

So, just like getting all the items of a scan, the only surefire way to get an accurate count is to iterate over all the returned pages until the last evaluated key is nil:

var (
  count       int
  lastEvalKey map[string]types.AttributeValue
)
for {
  out, err := client.Scan(&dynamodb.ScanInput{
    TableName:                 aws.String("my-table"),
    ExpressionAttributeNames:  expr.Names(),
    ExpressionAttributeValues: expr.Values(),
    FilterExpression:          expr.Filter(),
    Select:                    types.SelectCount,
  })
  if err != nil {
    return 0, err
  }

  count += int(out.Count)
  if out.LastEvaluatedKey == nil {
    break
  }
  lastEvalKey = out.LastEvaluatedKey
}

return count, nil

Note: I didn't test queries, so I'm only assuming that this also works for those as well.