Subgraphs vs. pipelines

Goldsky offers two high-level approaches to working with blockchain data:

Hosted GraphQL APIs via Subgraphs
Real-time streaming pipelines via Mirror and Turbo

Goldsky Subgraphs are a powerful abstraction built on top of blockchain indexing technology. They allow developers to define data sources, transformation logic, and queries using GraphQL against an instant API endpoint, making it easier to retrieve structured blockchain data. Subgraphs are particularly useful for dApps (decentralized applications) that need to query specific pieces of on-chain data efficiently. Goldsky Mirror and Turbo provide a different approach to managing blockchain and off-chain data. They focus on real-time streaming to a database, giving you full control over the data pipeline. Instead of querying through an API like subgraphs, Mirror and Turbo stream raw or processed data directly to a database you manage. This setup gives you greater flexibility in how you store, manipulate, and query information. Let’s now look into the difference between both products from the perspective of important functional dimensions:

1. Data Design

Subgraphs: The data model is optimized specifically for on-chain data:
- The entities and data types you define in subgraphs are tailored to represent blockchain data efficiently.
- You can create relationships and aggregations between on-chain entities (e.g. track the balance of a user for specific tokens)
- You can enrich entity data with other on-chain sources using eth_call. However, integrating off-chain data is not supported.
- On-chain data is restricted to EVM-compatible chains.
Mirror/Turbo: The data model is flexible and open to any type of data:
- You can combine blockchain data with your own data seamlessly, offering more complex and customizable use cases.
- With Mirror/Turbo, the recommendation is to get data into your database and do more aggregations/enrichments downstream. You can also perform some pre-processing using SQL or Typescript before writing to your database, but stateful transformations are limited.
- On-chain data not restricted to EVMs. Mirror currently support alternative L1s such as Solana, Sui and many others.

2. Infrastructure

Subgraphs:
- Provide an instant GraphQL API that’s ready to use right out of the box for querying blockchain data.
- The entire infrastructure (including indexing, database and querying layer) is managed by Goldsky, minimizing setup time but limiting control over data handling.
Mirror/Turbo:
- Fully runs on Goldsky’s infrastructure to stream data into your database in real-time, but ultimately, you need to set up and manage your own data storage and querying infrastructure.
- Offers full control over data, allowing you to optimize infrastructure and scale as needed. This way, you can colocate the data with other data realms of your business and offer greater privacy and other DX benefits.

3. Ecosystem & Development

Subgraphs:
- Established technology with a rich ecosystem. Many open-source repositories are available for reference, and Goldsky’s Instant Subgraphs allow you to quickly create subgraphs with no code.
- There is a vast community around subgraph development, making it easier to find support, tutorials, and pre-built examples.
Mirror/Turbo:
- As a newer product, Mirror/Turbo don’t have as many public examples or pre-built repositories to reference. However, users can benefit from Goldsky’ss support to set up and optimize their pipelines. We also create curated datasets for specific use cases that are readily available for deployment. Because Mirror/Turbo pipelines are also more simply defined (in YAML), they are more straightforward to work with AI agents.

4. Scalability & Performance

Subgraphs:
- Perform well under low throughput conditions (less than ~100 events per second). However, as event frequency increases, latency grows, and maintaining subgraphs becomes more complex.
- Goldsky offers the ability to fine-tune subgraphs and optimize performance. However, it’s important to note that there’s a limit to how much fine-tuning can be done within the subgraph framework
- Multi-chain setups often require reindexing from scratch, which can be time-consuming when switching between chains. This can slow down applications that rely on frequent updates from multiple chains.
Mirror/Turbo:
- Designed for scalability. You can expand your infrastructure horizontally (adding more servers, optimizing queries, etc.) as the data load increases. A default Mirror pipeline writes about 2,000 rows a second, but you can scale up to 40 workers with an XXL Mirror pipeline. With that, you can see speeds of over 100,000 rows a second; backfilling the entire Ethereum blocks table in under 4 minutes.
- Thanks to its fast backfill capabilities and the fact that you can colocate data as you see fit, Mirror/Turbo are optimized for multi-chain applications as represented in this article by Split’s Engineering team
- Real-time streaming ensures that query latency is only limited by the performance of your own database, not by external API calls or indexing limitations.

5. Expressiveness on data transformation

Subgraphs:
- Data transformation in subgraph mappings is very expressive.
Mirror/Turbo:
- Data transformation in Mirror/Turbo can be done in several ways:
  - SQL transforms: this can be advantageous for users proficient on SQL but it can feel a bit more rigid for developers with not as much experience with this language.
  - Typescript transforms: this can be advantageous for developers with not as much experience with SQL, but requires more compute resources to execute.
  - External Handlers: you own the processing layer and have full flexibility on how you would like transform the data using the technology and framework of your choice.

Common Use Cases

Now that we have covered the most important functional differences, let’s look at some practical scenarios where it makes more sense choosing one technology over the other:

Subgraphs:
- Best suited for applications that deal exclusively with on-chain data and don’t need integration with off-chain sources.
- Ideal for predefined data models, such as dApps that need to query specific smart contract events or execute standard blockchain queries.
- Great for low to moderate traffic scenarios with relatively straightforward data structures and querying needs.
Mirror/Turbo:
- A better fit for applications that require both on-chain and off-chain data, offering the flexibility to combine data sources for advanced analytics or decision-making.
- Ideal for multi-chain applications, as it simplifies the process of managing data across different blockchains without the need for reindexing. This is especially true if your application needs non-EVM data like Solana.
- Perfect for high-traffic applications, where low latency and real-time data access are critical to the performance and user experience.

Subgraph + Mirror/Turbo

Fortunately, you are not restricted to choose one technology over the other: Subgraphs and Mirror/Turbo can be combined to leverage the strengths of both technologies by defining subgraphs as the data source to your pipelines. This dual approach ensures that applications can benefit from the speed and convenience of instant APIs while also gaining full control over data storage and integration through Mirror/Turbo.

Overview

More information

Subgraphs vs. pipelines

1. Data Design

2. Infrastructure

3. Ecosystem & Development

4. Scalability & Performance

5. Expressiveness on data transformation

Common Use Cases

Subgraph + Mirror/Turbo

Overview

More information

​1. Data Design

​2. Infrastructure

​3. Ecosystem & Development

​4. Scalability & Performance

​5. Expressiveness on data transformation

​Common Use Cases

​Subgraph + Mirror/Turbo

1. Data Design

2. Infrastructure

3. Ecosystem & Development

4. Scalability & Performance

5. Expressiveness on data transformation

Common Use Cases

Subgraph + Mirror/Turbo