Subgraphs
You can use subgraphs as a pipeline source, allowing you to combined the flexibility of subgraph indexing with the expressiveness of the database of your choice.
This enables a lot of powerful use-cases:
- Reuse all your existing subgraph entities.
- Increase querying speeds drastically compared to graphql-engines.
- Flexible aggregations that weren’t possible with just GraphQL.
- Analytics on protocols through Rockset, Clickhouse, and more.
- Plug into BI tools, train AI, and export data for your users
Using a pipeline definition
In the sources section of your pipeline definition, you can add a subgraphEntity
per subgraph entity that you want to use.
sources:
- type: subgraphEntity
# The deployment IDs you gathered above. If you put multiple,
# they must have the same schema
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
# A name, referred to later in the `sourceStreamName` of a transformation or sink
referenceName: account
entity:
# The name of the entities
name: account
- type: subgraphEntity
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
referenceName: market_daily_snapshot
entity:
name: market_daily_snapshot
Automatic Deduplication
Subgraphs natively support time travel queries. This means every historical version of every entity is stored. To do this, each row has an id
, vid
, and block_range
.
When you update an entity in a subgraph mapping handler, a new row in the database is created with the same ID, but new VID and block_range, and the old row’s block_range
is updated to have an end.
By default, pipelines deduplicate on id
, to show only the latest row per id
. In other words, historical entity state is not kept in the sink database. This saves a lot of database space and makes for easier querying, as additional deduplication logic is not needed for simple queries. In a postgres database for example, the pipeline will update existing rows with the values from the newest block.
This deduplication happens through setting the primary key in the data going through the pipeline. By default, the primary key is id
.
If historical data is desired, you can set the primary key to vid
through a transform.
sources:
- type: subgraphEntity
# The deployment IDs you gathered above. If you put multiple,
# they must have the same schema
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
# A name, referred to later in the `sourceStreamName` of a transformation or sink
referenceName: account
entity:
# The name of the entities
name: account
transforms:
- referenceName: historical_accounts
type: sql
# The `account` referenced here is the referenceName set in the source
sql: >-
select * from account
primaryKey: vid
sinks:
- type: postgres
table: historical_accounts
schema: goldsky
secretName: A_POSTGRESQL_SECRET
# the `historical_accounts` is the referenceKey of the transformation made above
sourceStreamName: historical_accounts
In this case, all historical versions of the entity will be retained in the pipeline sink. If there was no table, tables will be automatically created as well.
Using the wizard
Subgraphs from your project
Use any of your own subgraphs as a pipeline source. Use goldsky pipeline create <pipeline-name>
and select Project Subgraph
, and push subgraph data into any of our supported sinks.
Community subgraphs
When you create a new pipeline with goldsky pipeline create <your-pipeline-name>
, select Community Subgraphs as the source type. This will display a list of available subgraphs to choose from. Select the one you are interested in and follow the prompts to complete the pipeline creation.
This will get load the subgraph into your project and create a pipeline with that subgraph as the source.
Was this page helpful?