Documentation

Data Sources

Most of the data used through the portal comes from the Filecoin chain. There is also some off-chain data like Datacap Applications or Storage Provider's Reputation scores that are collected from other places.

Deals

Deals data is available on chain and can be obtained in different ways:

Clients

Clients can be derived from the deals dataset and expanded with the following sources:

Storage Providers

Storage Providers can be derived from the deals dataset. More information about providers can be collected in the following sources:

Retrieval Data

Retrieval data is available from the Spark API.

Reputation Data

Reputation is both obtained from FilRep (methodology) and augmented with custom metrics around deals. For example, what is the average replication of a deal for the SP?

Energy Data

Energy data is available from Filecoin Green (Model API and Green Scores API)

FVM

Filecoin Virtual Machine data is trickier to get. Some sources:

Messages

A few teams across the ecosystem are indexing Filecoin Messages. The most comprehensive source are Beryx and FilInfo.

Data Indexers

Besides the data sources mentioned above, there are a few data indexers that provide data in a more structured way.

JSON-RPC Endpoints

Nodes usually implement all the JSON-RPC methods needed to get the data.

Code

Using Datasets

The Filecoin Data Portal publishes up to date dataset on a daily bases as static Parquet files. You can then use any tool you want to explore and use these datasets! Let's go through some examples.

Python

You can use the pandas library to read the Parquet files. You can play with the datasets in Google Colab for free. Check this sample notebook.

Open In Colab

JavaScript

You can use the duckdb Obervable client library to read the Parquet files and run SQL queries on them. Check this sample Observable JS notebook to see how to explore and visualize the datasets.

Dune

Some of the datasets built by the pipelines are also available in Dune. You can use the Dune SQL editor to run queries on these datasets. Check this one on Dune.

Google Sheets

The pipelines that are executed to generate the datasets are also pushing the data to Google Sheets. You can access the data directly from these Google Sheets:

You can create a new personal Google Sheet and use the IMPORTRANGE function to read data from these sheets and be able to plot or add more transformations on top.

BI Tools

Depending on the BI tool you are using, you can connect to the Parquet files directly, use the Google Sheets as a data source, or you'll need to load the data into a database like PostgreSQL or BigQuery. There are

Evidence

Filecoin Pulse is a website build with Evidence using the Filecoin Data Portal datasets. You can check the source code on GitHub to see how to use the datasets in Evidence.

Observable Framework

Another alternative is to use the Observable framework to create dashboards and visualizations. You can use parquet files as data sources and generate beautiful static websites providing dashboards and reports like Filecoin in Numbers, a dashboard built with Observable Framework on top of the Portal open datasets. You can check the source code on GitHub too.

Others

Do you have any other tool you want to use to explore the datasets? Reach out and let's explore how to use the datasets with your favorite tools!