Skip to content

Batching

Pipelines automatically batches ingested records. Batching helps reduce the number of output files written to your destination, which can make them more efficient to query.

Batch settings apply after the ingestion stage of a Pipeline. As soon as a batch is filled, the batch of data will be delivered downstream to any transformations you've configured, and then finally to your configured destination.

There are three ways to define how ingested data is batched:

  1. batch-max-mb: The maximum amount of data that will be batched, in megabytes. Default is 10 MB, maximum is 100 MB.
  2. batch-max-rows: The maximum number of rows or events in a batch before data is written. Default, and maximum, is 10,000 rows.
  3. batch-max-seconds: The maximum duration of a batch before data is written, in seconds. Default is 15 seconds, maximum is 300 seconds.

Pipelines batch definitions are hints. A pipeline will follow these hints closely, but batches will not be exact.

All three batch definitions work together. Whichever limit is reached first triggers the delivery of a batch.

For example, a batch-max-mb = 100 MB and a batch-max-seconds = 600 means that if 100 MB of events are posted to the Pipeline, the batch will be delivered. However, if it takes longer than 600 seconds for 100 MB of events to be posted, a batch of all the messages that were posted during those 300 seconds will be created and delivered.

Defining batch settings using Wrangler

To update the batch settings for an existing Pipeline using Wrangler, run the following command in a terminal

Terminal window
npx wrangler pipelines update [PIPELINE-NAME] --batch-max-mb 100 --batch-max-rows 10000 --batch-max-seconds 300

Batch settings

You can configure the following batch-level settings to adjust how Pipelines create a batch:

SettingDefaultMinimumMaximum
Maximum Batch Size batch-max-mb10 MB0.001 MB100 MB
Maximum Batch Timeout batch-max-seconds15 seconds0 seconds300 seconds
Maximum Batch Rows batch-max-rows10,000 rows1 row10,000 rows