Extending Timeline Downloader: Identity Forensics and the 9000-Event Limit

Table of contents

Architecture Updates
The “Skip 9000” Problem
- The Algorithm: Boundary Truncation
Quality of Life Improvements
- GZIP Compression
- Enhanced Error Visualization
Performance Benchmarks
Usage
Conclusion

In a previous post, I released timeline-dl, a tool designed to bypass the 30-day API retention limit for Microsoft Defender for Endpoint (MDE) devices. The initial release focused solely on device telemetry.

I have since updated the tool to version 0.1.0. This update expands the scope to include Identity timelines, allowing incident responders to extract user behavior logs for the full six-month retention period.

This post documents the technical differences between the Device and Identity APIs, specifically addressing a hard pagination limit in the Identity backend that necessitated a custom file-truncation algorithm. It also covers quality-of-life improvements including GZIP compression and enhanced error visualization.

Architecture Updates

To support Identity data without duplicating the core logic, the tool’s architecture underwent a significant refactor to support device timelines.

Previously, the worker pool was tightly coupled to the Device schema. I introduced a generic Job interface. This allows the concurrent worker pool to process heterogeneous tasks—whether resolving a Device ID or searching for a User UPN—using the same scheduling and error-handling logic.

Additionally, the Merge Logic was updated to be “direction-aware.”

Devices return events in chronological order (oldest first).
Identities return events in reverse-chronological order (newest first).

To ensure the final output files remain strictly ordered, the chunk merger now respects the temporal direction of the source data. For identities, it intelligently reverses the sequence in which chunks are stitched together, preserving a continuous, ordered history in the final JSONL file.

New dedicated API clients were also added to interact with the specific proxy endpoints used by the Identity timeline experience:

Search: /apiproxy/mdi/identity/userapiservice/identities
Resolution: /apiproxy/mdi/identity/userapiservice/user/resolve
Timeline: /apiproxy/mdi/identity/userapiservice/timeline/mtp

The “Skip 9000” Problem

The Device timeline API uses a cursor-based approach (providing a Next link with a pre-calculated time window), which makes “infinite scrolling” straightforward. The Identity API, however, uses offset-based pagination via a skip parameter.

The Identity API returns events in descending order (newest to oldest). Crucially, the backend enforces a hard limit on the offset: it will not accept a skip value greater than 9000.

This means that for any given time window, the maximum number of events you can retrieve is 9000 + pageSize. If a specific user generates 20,000 events within a requested week, standard pagination will hit a wall once the skip parameter maxes out, leaving the remaining 10,000+ older events inaccessible.

The Algorithm: Boundary Truncation

To circumvent this, timeline-dl implements a dynamic window resizing algorithm. Since we cannot simply increase the offset indefinitely, we must reset the time window once we approach the skip limit.

However, simply resetting the window based on the timestamp of the last received event introduces a race condition. If multiple events occur at the exact same second (the “boundary time”), and the page break splits them, moving the time window could result in data loss or duplication.

To guarantee data integrity, the tool utilizes a File Truncation strategy.

Fetch & Track: As the tool fetches pages and writes events to disk, it maintains a running count of consecutive events that share the same timestamp. Because events are returned in descending order (newest first), this counter effectively tracks how many events exist at the current “bottom” of the time window.
Detection: If the skip value for the next request would exceed 9000, the tool recognizes it cannot paginate further in the current window.
Truncation:
- It uses the running count (boundaryCount) to determine exactly how many events at the end of the file belong to the final, potentially incomplete second.
- It truncates those specific lines from the physical output file.
Reset:
- The skip counter is reset to 0.
- The upper time bound (toDate) is adjusted to batchMinTime + 1s (inclusive of the boundary second).

By discarding the partial data for that final second and shifting the window to include it entirely, the next request (starting at skip=0) re-fetches all events for that second. This ensures that even if 50 events occurred at that specific second, they are all captured in the new window without hitting the offset limit.

You might worry about the overhead of discarding data. In practice, the number of events sharing the exact same second at the boundary is typically very low (single digits), even for high-volume accounts. Consequently, the “wasted” I/O operations from truncation are negligible compared to the stability gained by ensuring a clean, gap-free timeline.

Figure 1 illustrates this flow.

flowchart TD
    Start([Start Batch]) --> Request[POST /timeline<br/>skip=N]
    Request --> Check{Received < PageSize?}

    Check -- Yes --> Done([Batch Complete])
    Check -- No --> Process[Process Events &<br/>Update Running Count]

    Process --> LimitCheck{Next Skip > 9000?}

    LimitCheck -- No --> Increment[skip += received]
    Increment --> Request

    LimitCheck -- Yes --> Truncate[Truncate last 'boundaryCount'<br/>lines from file]
    Truncate --> Adjust[Set toDate = batchMinTime + 1s]
    Adjust --> ResetSkip[Set skip = 0]
    ResetSkip --> Request

Figure 1: Identity Pagination with Truncation

Quality of Life Improvements

Beyond the new data source, version 0.1.0 introduces operational improvements based on feedback from initial usage.

GZIP Compression

Timeline data is voluminous. A single active endpoint can generate gigabytes of JSON logs over a six-month period. The tool now supports the --gzip (or -z) flag.

This is implemented via a transparent writer chain. The JSONLWriter wraps a gzip.Writer, which in turn wraps the file handle. This ensures data is compressed in-stream before being flushed to disk, significantly reducing I/O overhead and storage requirements without requiring post-process compression.

Enhanced Error Visualization

When running massive jobs across hundreds of workers, fatal errors (such as expired session cookies or conditional access blocks) could previously be lost in the scrolling logs.

The TUI has been updated to capture and display fatal errors prominently. If a worker encounters a non-recoverable error, it now locks its status display to red, ensuring the operator is immediately aware of the failure state.

Performance Benchmarks

To validate the efficiency of the new identity timeline downloader, I benchmarked the tool against a high-volume target: a single identity generating approximately 600,000 events over a one-month period.

The test used 12-hour time chunks to split the month into parallelizable segments. As with the device timeline, increasing worker concurrency yields significant speedups until the API rate limits or bandwidth saturation are reached.

Configuration	Duration	Speedup
1 Worker	20m 47s	1x (Baseline)
2 Workers	11m 49s	1.76x
4 Workers	08m 15s	2.52x
8 Workers	07m 13s	2.89x

This demonstrates that the “skip-limit” workaround does not significantly hinder performance. By parallelizing the time range, we can retrieve over half a million complex identity events in under eight minutes.

Usage

Downloading identity timelines requires the --identities flag or a file input via --identity-file.

# Download timeline for specific users with compression
./timeline-dl --identities "admin_svc, jdoe" --days 30 --gzip

# Download from a list of UPNs/SAMAccountNames
./timeline-dl --identity-file ./targets.txt --from 2025-01-01T00:00:00Z --to 2025-06-01T00:00:00Z

The output file naming convention for identities distinguishes them from devices:

Device: {hostname}_{machineId}_timeline.jsonl.gz
Identity: {accountname}_{radiusUserId}_identity_timeline.jsonl.gz

Conclusion

The addition of Identity timelines makes timeline-dl a more comprehensive tool for forensic data acquisition in Microsoft environments. By handling the specific pagination idiosyncrasies of the backend, it allows analysts to automate the retrieval of user activity logs that would otherwise be tedious to export via the browser.

The updated source code is available on GitHub.

View on GitHub