Goldsky offers two flagship products, Subgraphs and Mirror, designed to help developers interact with blockchain data more efficiently. While both tools serve similar goals - retrieving, processing, and querying blockchain data - they differ significantly in how they handle data management, scalability, and customization.

Goldsky Subgraphs are a powerful abstraction built on top of blockchain indexing technology. They allow developers to define data sources, transformation logic, and queries using GraphQL against an instant API endpoint, making it easier to retrieve structured blockchain data. Subgraphs are particularly useful for dApps (decentralized applications) that need to query specific pieces of on-chain data efficiently.

Goldsky Mirror provide a different approach to managing blockchain and off-chain data, focusing on real-time data streaming directly to a database, offering full control and customization over the data pipeline. Instead of providing an API endpoint for querying data like subgraphs, Mirror Pipelines stream the raw or processed data directly to a database managed by the user. This setup allows users to have more flexibility in how they store, manipulate, and query this data.

Let’s now look into the difference between both products from the perspective of important functional dimensions:

1. Data Design

  • Subgraphs:
    • The data model is optimized specifically for on-chain data:
      • The entities and data types you define in subgraphs are tailored to represent blockchain data efficiently.
      • You can create relationships and aggregations between on-chain entities (e.g. track the balance of a user for specific tokens)
      • You can enrich entity data with other on-chain sources using eth_call. However, integrating off-chain data is not supported.
      • On-chain data is restricted to EVM-compatible chains.
  • Mirror:
    • The data model is flexible and open to any type of data:
      • You can combine blockchain data with your own data seamlessly, offering more complex and customizable use cases.
      • Using Mirror the recommendation is to get data into your database and do more aggregations/enrichments downstream. You can also perform some pre-processing using SQL dialect before writing to your database.
      • On-chain data not restricted to EVMs. Mirror currently support alternative L1s such as Solana, Sui and many others.

2. Infrastructure

  • Subgraphs:
    • Provide an instant GraphQL API that’s ready to use right out of the box for querying blockchain data.
    • The entire infrastructure (including indexing, database and querying layer) is managed by Goldsky, minimizing setup time but limiting control over data handling.
  • Mirror:
    • Fully runs on Goldsky’s infrastructure to stream data into your database in real-time, but ultimately, you need to set up and manage your own data storage and querying infrastructure.
    • Offers full control over data, allowing you to optimize infrastructure and scale as needed. This way, you can colocate the data with other data realms of your business and offer greater privacy and UX.

3. Ecosystem & Development

  • Subgraphs:
    • Established technology with a rich ecosystem. Numerous open-source repositories are available for reference, and Goldsky’s Instant Subgraphs allow you to quickly create subgraphs with no code.
    • There is a vast community around subgraph development, making it easier to find support, tutorials, and pre-built examples.
  • Mirror:
    • As a newer product, Mirror Pipelines doesn’t have as many public examples or pre-built repositories to reference. However, users can benefit from Goldsky’s support to set up and optimize their pipelines. We also create curated datasets for specific use cases that are readily available for deployment.

4. Scalability & Performance

  • Subgraphs:
    • Perform well under low throughput conditions (less than 40-50 events per second). However, as event frequency increases, latency grows, and maintaining subgraphs becomes more complex.
    • Goldsky offers the ability to fine-tune subgraphs, providing custom indexers that help optimize performance and efficiency. However, it’s important to note that there’s a limit to how much fine-tuning can be done within the subgraph framework
    • Multi-chain setups often require reindexing from scratch, which can be time-consuming when switching between chains. This can slow down applications that rely on frequent updates from multiple chains.
  • Mirror:
    • Designed for scalability. You can expand your infrastructure horizontally (adding more servers, optimizing queries, etc.) as the data load increases. A default Mirror pipeline writes about 2,000 rows a second, but you can scale up to 40 workers with an XXL Mirror pipeline. With that, you can see speeds of over 100,000 rows a second; backfilling the entire Ethereum blocks table in under 4 minutes.
    • Thanks to its fast backfill capabilities and the fact that you can colocate data as you see fit, Mirror is optimized to multi-chain applications as represented in this article by Split’s Engineering team
    • Real-time streaming ensures that query latency is only limited by the performance of your own database, not by external API calls or indexing limitations.

5. Expressiveness on data transformation

  • Subgraphs:
    • Data transformation in subgraph mappings is very expressive due to the fact that’s defined on Javascript which is a very popular language with lots of support and customization possibilities.
  • Mirror:
    • Data transformation in Mirror can be done in 2 ways:
      • SQL transforms: this can be advantageous for users proficient on SQL but it can feel a bit more rigid for developers with not as much experience with this language.
      • External Handlers: you own the processing layer and have full flexibility on how you would like transform the data using the technology and framework of your choice.

Common Use Cases

Now that we have covered the most important functional differences, let’s look at some practical scenarios where it makes more sense choosing one technology over the other:

  • Subgraphs:
    • Best suited for applications that deal exclusively with on-chain data and don’t need integration with off-chain sources.
    • Ideal for predefined data models, such as dApps that need to query specific smart contract events or execute standard blockchain queries.
    • Great for low to moderate traffic scenarios with relatively straightforward data structures and querying needs.
  • Mirror Pipelines:
    • A better fit for applications that require both on-chain and off-chain data, offering the flexibility to combine data sources for advanced analytics or decision-making.
    • Ideal for multi-chain applications, as it simplifies the process of managing data across different blockchains without the need for reindexing. This is specially true if your application needs non-EVM data like Solana.
    • Perfect for high-traffic applications, where low latency and real-time data access are critical to the performance and user experience.

Subgraph + Mirror

Fortunately, you are not restricted to choose one technology over the other: Subgraphs and Mirror can be combined to leverage the strengths of both technologies by definining subgraphs as the data source to your pipelines. This dual approach ensures that applications can benefit from the speed and convenience of instant APIs while also gaining full control over data storage and integration through Mirror 💪

Conclusion

While both Subgraphs and Mirror Pipelines offer powerful solutions for interacting with blockchain data, choosing the right tool depends on your specific needs. In some cases, either technology may seem like a viable option, but it’s important to carefully evaluate your requirements. If you’re working with simpler on-chain data queries or need quick setup and ease of use, Subgraphs might be the best fit. On the other hand, for more complex applications that require real-time data streaming, multi-chain support, or the integration of off-chain data, Mirror Pipelines provides the flexibility and control you need. Remember that you are not constrained to one technology but that you can combine them to get the best of both worlds.

Ultimately, selecting the right solution—or a combination of both—depends on aligning with your project’s performance, scalability, and infrastructure goals to ensure long-term success.