Optimize performance
Axiom is blazing fast. This page explains how you can further improve performance in Axiom.
Axiom is optimized for storing and querying timestamped event data. However, certain ingest and query practices can degrade performance and increase cost. This page explains pitfalls and provides guidance on how you can avoid them to keep your Axiom queries fast and efficient.
Summary of pitfalls
Practice | Severity | Impact |
---|---|---|
Mixing unrelated data in datasets | Critical | Combining unrelated data inflates schema, slows queries |
Excessive backfilling, big difference between _time and _sysTime | Critical | Creates overlapping blocks, breaks time-based indexing |
Large number of fields in a dataset | High | Very high dimensionality slows down query performance |
Failing to use _time | High | No efficient time-based filtering |
Overly wide queries (project *) | High | Returns massive unneeded data |
Mixed data types in the same field | Moderate | Reduces compression, complicates queries |
Using regex when simpler filters suffice | Moderate | More CPU-heavy scanning |
Overusing runtime JSON parsing (parse_json) | Moderate | CPU overhead, no indexing on nested fields |
Virtual fields for simple transformations | Low | Extra overhead for trivial conversions |
Poor filter order in queries | Low | Suboptimal scanning of data |
Mixing unrelated data in datasets
Problem
A “kitchen-sink” dataset is one in which events from multiple, unrelated applications or services get lumped together, often resulting in:
- Excessive width (too many columns): Adding more and more unique fields bloats the schema, reducing query throughput.
- Mixed data types in the same field: For example, some events store
user_id
as a string, while others store it as a number in the sameuser_id
field. - Unrelated schemas in a single dataset: Columns that make sense for one app might be
null
or typed differently for another.
These issues reduce compression efficiency and force Axiom to scan more data than necessary.
Why it matters
- Slower queries: Each query must scan wider blocks of data and handle inconsistent column types.
- Higher resource usage: Wide schemas reduce row packing in blocks, harming throughput and potentially raising costs.
- Harder data exploration: When fields differ drastically between events, discovering the correct columns or shaping queries becomes more difficult.
How to fix it
- Keep datasets narrowly focused: Group data from the same application or service in its own dataset. For example, keep
k8s_logs
separate fromweb_traffic
. - Avoid mixing data types for the same field: Enforce consistent types during ingest. If a field is numeric, always send numeric values.
- Consider using map fields: If you have sparse or high-cardinality nested data, consider storing it in a single map (object) field instead of flattening every key. This reduces the total number of top-level fields. Axiom’s map fields are optimized for large objects.
Excessive backfilling and large _time
vs. _sysTime
gaps
Problem
Axiom’s _time
index is critical for query performance. Ideally, incoming events for a block lie in a closely bounded time range. However, backfilling large amounts of historical data after the fact (especially out of chronological order) creates wide time overlaps in blocks. If _time
is far from _sysTime
(the time the event was ingested), Axiom’s time index effectiveness is weakened.
Why it matters
- Poor performance on time-based queries: Blocks must be scanned despite time filters, because many blocks overlap the query time window.
- Inefficient block filtering: Queries that filter on time must scan blocks that contain data from a wide time range.
- Large data merges: Compaction processes that rely on time ordering become less efficient.
How to fix it
- Minimize backfill: Try to ingest events close to their actual creation time whenever possible. Ingest events close to the time they occur.
- Backfill in dedicated batches: If you must backfill older data, do it in dedicated batches that do not mix with live data.
- Use discrete backfill intervals: When backfilling data, ingest one segment at a time (for example, day-by-day).
- Avoid wide time ranges in a single batch: If you are sending data for a 24-hour period, avoid mixing in data that is weeks or months older.
- Be aware of ingestion concurrency: Avoid mixing brand-new events with extremely old events in the same ingest request.
Future improvements: Axiom’s roadmap includes an initiative which aims to mitigate the impact of poorly clustered time data by performing incremental time-based compaction. Until then, avoid mixing large historical ranges with live ingest whenever possible.
Large number of fields in a dataset
Problem
Slow query performance in datasets with very high dimensionality (with more than several thousand fields).
Why it matters
Axiom stores event data in a tuned format. As a result:
- The number of distinct values (cardinality) in your data impacts performance because low-cardinality fields compress better than high-cardinality fields.
- The number of fields in a dataset (dimensionality) impacts performance.
- The volume of data collected impacts performance.
How to fix it
Scoping the number of fields in a dataset below a few thousand can help you achieve the best performance in Axiom.
Failing to use the _time
field for event timestamps
Problem
Axiom’s core optimizations rely on _time
for indexing and time-based queries. If you store event timestamps in a different field (for example, timestamp
or created_at
) and use that field in time filters, Axiom’s time-based optimizations will not be leveraged.
Why it matters
- No time-based indexing: Every block must be scanned because your custom timestamp field is invisible to the time index.
How to fix it
- Always use
_time
: Configure your ingest pipelines so that Axiom sets_time
to the actual event timestamp.- If you have a custom field like
created_at
, rename it to_time
at ingest. - Verify that your ingestion library or agent is correctly populating
_time
.
- If you have a custom field like
- Use Axiom’s native time filters: Rely on
where _time >= ... and _time <= ...
or the built-in time range selectors in the query UI.
Handling mixed types in the same field
Problem
A single field sometimes stores different data types across events (for instance, strings in some events and integers in others). This is typically a side effect of using “kitchen-sink” ingestion or inconsistent parsing logic in your code.
Why it matters
- Reduced compression: Storing multiple types in the same column (variant column) is less efficient than storing a single type.
- Complex queries: You might need frequent casting or conditional logic in queries (
tostring()
calls, etc.).
How to fix it
- Standardize your types at ingest: If a field is semantically an integer, always send it as an integer.
- Use consistent schemas across services: If multiple applications write to the same dataset, agree on a schema and data types.
- Perform corrections at the source: If you discover your data has been mixed historically, stop ingesting mismatched types. Over time, new blocks will reflect the corrected types even though historical blocks remain mixed.
Overly wide queries returning more fields than needed
Problem
By default, Axiom’s query engine projects all fields (project *
) for each matching event. This can return large amounts of unneeded data, especially in wide datasets with many fields.
Why it matters
- High I/O and memory usage: Unnecessary data is scanned, read, and returned.
- Slower queries: Time is wasted processing fields you never use.
How to fix it
-
Use
project
orproject-keep
Specify exactly which fields you need. For example:
-
Use
project-away
if you only need to exclude a few fields: If you need 90% of the fields but want to exclude the largest ones, for instance: -
Limit your results
If you only need a sample of events for debugging, use a lower
limit
value (such as 10) instead of the default 1000.
Regular expressions when simple filters suffice
Problem
Regular expressions (matches
, regex
) can be powerful, but they are also expensive to evaluate, especially on large datasets.
Why it matters
- High CPU usage: Regex filters require complex per-row matching.
- Slower queries: Large swaths of data are scanned with less efficient matching.
How to fix it
-
Use direct string filters
Instead of:
Use:
-
Use
search
for substring search:To find
foobar
in all fields, use:search
matches text in all fields. To find text in a specific field, a more efficient solution is to use the following:In this example,
cs
stands for case-sensitive.
Overusing runtime JSON parsing (parse_json
)
Problem
Some ingestion pipelines place large JSON payloads into a string field, deferring parsing until query time with parse_json()
. This is both CPU-intensive and slower than columnar operations.
Why it matters
- Repeated parsing overhead: You pay a performance penalty on each query.
- Limited indexing: Axiom cannot index nested fields if they are only known at query time.
How to fix it
-
Ingest as map fields: Axiom’s new map column type can store object fields column by column, preserving structure and optimizing for nested queries. This allows indexing of specific nested keys.
-
Extract top-level fields where possible: If a certain nested field is frequently used for filtering or grouping, consider promoting it to its own top-level column (for faster scanning and filtering).
-
Avoid
parse_json()
in query: If your JSON cannot be flattened entirely, ingest it into a map field. Then query subfields directly:
Virtual fields for simple transformations
Problem
You can create virtual fields (for example, extend converted = toint(some_field)
) to transform data at query time. While sometimes necessary, every additional virtual field imposes overhead.
Why it matters
- Increased CPU: Each virtual field requires interpretation by Axiom’s expression engine.
- Slower queries: Overuse of
extend
for trivial or frequently repeated operations can add up.
How to fix it
-
Avoid unnecessary casting: If a field must be an integer, handle it at ingest time.
Example: Instead of
Use:
The filter automatically matches string values in mixed columns.
-
Reserve virtual fields for truly dynamic or derived logic
If you frequently need a computed value, store it at ingest or keep the transformations minimal.
Poor filter order in queries
Problem
Axiom’s query engine does not currently reorder your where
clauses optimally. This means the sequence of filters in your query can matter.
Why it matters
- Unnecessary scans: If you use selective filters last, the engine may process many rows before discarding them.
- Longer execution times: CPU usage and scan times increase.
How to fix it
-
Put the most selective filters first:
Example:
If
user_id == 1234
discards most rows, apply it beforelog_level == "ERROR"
. -
Profile your filters: Experiment with which filters discard the most rows to find the most selective conditions.
Was this page helpful?