
In the Ethereum ecosystem, something as simple as “getting all token holders and their balances” turns out to be a surprisingly complex problem.
Every wallet can show its own balance, but listing all non-zero addresses globally? That’s an entirely different story. One that touches the very core of Ethereum’s design philosophy.
This is not merely a query problem, but a state reconstruction problem. Ethereum’s state model imposes inherent difficulties.
Contract State is a Black Box. A standard ERC-20 contract stores balances like this:
mapping(address => uint256) public balanceOf;Mappings are designed for O(1) lookups, given an address, return the balance, but they are intentionally non-iterable.
The EVM provides no instruction to list all keys, precisely to avoid state bloat and DoS risks.
And while the ERC-20 standard defines the Transfer event…1
event Transfer(address indexed from, address indexed to, uint256 value);…it only gives you signals of change, not the full state itself.
In theory, you could replay all Transfer events from the token’s deployment block onward, maintaining an in-memory address → balance mapping to get all current holders and balances.
But this approach quickly collapses under real-world conditions.
Many ERC-20 contracts mint tokens during deployment to addresses such as the team wallet or liquidity pool:
constructor() {
_mint(owner(), 1_000_000 * 10**18);
}If _mint does not emit Transfer(address(0), to, amount), your event listener will not detect this initial supply, causing serious gaps in total supply and holder data.
If user A holds 100 tokens and transfers them all out:
balanceOf(A) = 0Historical events will still contain:
Transfer(0x0, A, 100)
Transfer(A, B, 100)Should A remain in the holder list?
Business view: only non-zero balances matter.
Technical view: each update must check if balance is zero and remove the address from the result set.
This logic seems simple but can fail under high concurrency or bulk processing, especially when addresses repeatedly move in and out.
As a result, event-based reconstruction becomes fragile, incomplete, and extremely resource-intensive.
In an ideal world, querying all ERC-20 holders would be as simple as asking an honest accountant: who currently holds this token, and how much?
Ethereum provides exactly this: the contract’s balanceOf(address) function. Knowing which addresses to ask, you can get absolute correctness.
Steps:
Aggregate all potential holder addresses from Transfer, Approve, Mint, Withdraw events.
Query balanceOf(address) directly on-chain for these addresses.
This approach respects the authoritative EVM state and avoids pitfalls like missing initial mint events, unrecorded burns, or contracts bypassing Transfer. It does not reconstruct state, it reads the state directly.
Absolute accuracy: balanceOf returns the true storage value at a given block.
Simple logic: no complex incremental state, zero-balance checks, or duplicate address deduplication.
Historical support: specify any blockNumber to take a snapshot for audits, airdrops, or analytics.
Robust to non-standard contracts: even if _transfer bypasses events, balanceOf still works.
You can practically view the Ethereum state from above, with full clarity.
Reality is harsher.
Some addresses only received one airdrop, then transferred out; they remain candidates.
Complex contracts like Uniswap V2 Pair hold tokens internally; top-level balances only show part of the picture.
Silent mints without Transfer(from=0x0) require manual known-address supplementation.
For a major token like USDT, candidate addresses may exceed 10 million, meaning 10 million eth_calls are needed.
Standard full nodes only keep about 128 recent blocks; older states are pruned. Using a full archive node is necessary:
Storage > 12TB (and growing)
Sync > 3 weeks
High CPU and bandwidth usage
Third-party services introduce cost, dependency, and sovereignty issues. Snapshots are static; continuous monitoring requires repeated snapshots and incremental updates, which scales poorly.
Despite its challenges, this approach remains the preferred choice when accuracy is the top priority:
You need a one-time authoritative snapshot (e.g., airdrops, snapshot voting, compliance reports) where incremental updates are minimal.
Token holder count is manageable (less than 100,000).
You have access to an archive node (self-hosted or paid).
You cannot fully trust event integrity (e.g., auditing an unknown contract).
Essentially, this method trades computational resources (JSON-RPC calls) and engineering complexity for absolute data accuracy and simple implementation logic.
It does not attempt to reconstruct the on-chain state incrementally—it directly queries the blockchain state, returning to the true essence of the EVM state machine.
In Ethereum’s “transparent but non-iterable” world, every state query comes at a cost. Skilled engineers find the most sustainable path in the gap between ideal theory and real-world constraints.
Chainbase’s EVM Tracer is an open-source framework for reconstructing and analyzing onchain state transitions.
It allows developers to:
Observe state changes directly, without relying on emitted events.
Track ERC-20 / 721 / 1155 balance updates in real time.
Build precise historical snapshots without archive nodes.
Analyze MEV, internal transactions, and contract-level storage writes.
Even if a token silently mints or bypasses Transfer events, as long as it writes to the balanceOf mapping slot, the tracer can detect it.
In practice, the Prestate-Based EVM Tracer is not a “third approach” to token holder queries, it is the natural evolution of the first two approaches under scale and complexity pressure.
In the early days, when querying ERC-20 token holders, Approach 1 (rebuilding holders from Transfer events) was almost the only viable starting point. It relied on standard events, was easy to implement, and worked for tokens with a limited number of holders and standard contract behavior.
However, as the on-chain ecosystem grew more complex—non-standard contracts proliferated, silent mints became common, liquidity pools nested tokens, and airdrop addresses exploded—event-driven models reached their limits. This led to Approach 2: aggregate candidate addresses from Transfer events and query balanceOf directly on-chain. This approach is a practical compromise, trading computational resources for correctness when events are not fully trustworthy.
Yet, when token holder counts reach millions, block intervals drop below a second (e.g., BSC), and TPS continues to climb, even Approach 2 struggles:
Each new block requires thousands to tens of thousands of eth_calls
Reliance on archive nodes brings massive storage and sync costs
Incremental updates and snapshot reconciliation grow exponentially complex
Third-party RPC rate limits and data consistency risks become bottlenecks
At this point, we realized the core problem is not “how to efficiently query state,” but “how to efficiently reconstruct state.”
In an ideal world, obtaining token holder data should not depend on event aggregation or sending massive RPC requests. Instead, we should observe how the state changes directly.
Ethereum already provides the foundation for this at a low level: EVM Tracing (Geth EVM Tracing Docs).
By combining full block data, pre-state execution snapshots, and a custom EVM Tracer, we can locally replay every transaction in a block and capture all state changes with precision. This allows:
Accurate snapshots of ERC-20/721/1155 token holders
Analysis of MEV, internal transactions, and contract-level storage writes
Lightweight alternatives to archive nodes
Thanks to this design, storage writes can be observed directly: Even if a contract never emits a Transfer event, as long as it modifies the balances[addr] storage slot, the tracer captures it via SSTORE.
Using precomputed storage slots for ERC-20 balanceOf mappings (typically keccak256(address || 0x0)), we can pinpoint which SSTORE operations correspond to balance changes.
The tracer can monitor all major token standards:
ERC-20: balanceOf(address) → uint256
ERC-721: ownerOf(tokenId) → address
ERC-1155: balanceOf(address, tokenId) → uint256
A single unified logic covers all popular token types, eliminating the need for standard-specific adaptations.
Since we replay the full execution context of a block, the resulting state naturally corresponds to that block’s confirmed status.
There is no need for repeated archive node balanceOf calls—all state changes are embedded in the execution trace.
Compared with Approach 2 (requiring N eth_calls for N candidate addresses), our approach needs only 2 JSON-RPC requests per block:
Feature | Traditional Approach | EVM Tracer |
|---|---|---|
Data Source | Transfer Events + | Full Block + Prestate |
RPC Calls | Tens of thousands | Only 2 |
Accuracy | Depends on event integrity | State-level accuracy |
Historical Query | Requires Archive Node | Native support |
Bottleneck | Network / RPC rate | Local computation |
As the EVM ecosystem evolves toward higher TPS and increasingly complex contracts,
state-level tracing is becoming an essential capability, not an optional tool.
Reconstructing onchain state with prestate-based tracing bridges the gap between transparency and practical observability, bringing blockchain data engineering closer to its ideal form:
see the actual state changes, not just the logs.
This makes it feasible to handle millions of token holders efficiently, something previously impractical with traditional RPC-based approaches.
Check out the code and try it yourself:
🔗 GitHub: https://github.com/chainbase-labs/evm-tracer
🧪 Demo: https://1block.dev/
Chainbase is building the Hyperdata Network for AI — a foundational layer for the DataFi era.
Built as a Hyperdata Network, Chainbase turns onchain signals into structured, verifiable, and AI-ready data that can be directly processed by AI models and decentralized applications. Its core stack includes:
Manuscript: a programmable layer for building data assets;
AVS layer: decentralized data execution and verification;
$C Token: the native currency for AGI.
This structured data layer supports a new generation of crypto applications that are autonomous, composable, and economically aligned with their users and contributors.
To date, Chainbase has indexed over 200 blockchains, processed more than 500 billion data calls, and supports a community of more than 35,000 developers. Over 10,000 projects actively use Chainbase across a wide range of use cases, including MEV infrastructure, L2 explorers, agent protocols, and onchain analytics.
The founding team brings deep experience in blockchain infrastructure, data engineering, and protocol security. Chainbase is backed by top-tier investors and works closely with ecosystems across modular infrastructure, large language models, and onchain AI.
As the need for machine-readable and economically aligned data continues to grow, Chainbase provides the foundational layer for a programmable data economy—one where information moves freely between agents, protocols, and people.
Website | Twitter | Discord | Telegram | Blog | Docs | Github

Share Dialog
Chainbase Team
No comments yet