Most of the data used through the portal comes from the Filecoin chain. There is also some off-chain data like Datacap Applications or Storage Provider's Reputation scores that are collected from other places.
Deals data is available on chain and can be obtained in different ways:
StateMarketDeals
JSON-RPC call and parsing the returned
JSON. If you don't have a node running, you can use
Glif nodes
StateMarketDeals
periodic dump on S3 (direct link).
fil-naive-marketwatch
.
Clients can be derived from the deals dataset and expanded with the following sources:
https://api.datacapstats.io/api/getVerifiedClients
. You get a JSON of verified clients in the FIL+ program that
contains client names, Datacap application data and other
self-reported data. Alternatively, this data can be obtained by
parsing the relevant GitHub repositories issues and comments.
Storage Providers can be derived from the deals dataset. More information about providers can be collected in the following sources:
Retrieval data is available from the Spark API.
Reputation is both obtained from FilRep (methodology) and augmented with custom metrics around deals. For example, what is the average replication of a deal for the SP?
Energy data is available from Filecoin Green (Model API and Green Scores API)
Filecoin Virtual Machine data is trickier to get. Some sources:
A few teams across the ecosystem are indexing Filecoin Messages. The most comprehensive source are Beryx and FilInfo.
Besides the data sources mentioned above, there are a few data indexers that provide data in a more structured way.
Nodes usually implement all the JSON-RPC methods needed to get the data.
https://api.node.glif.io
https://api.zondax.ch/fil/node/mainnet/rpc/v1
https://fil-mainnet-1.rpc.laconic.com/rpc/v1
https://lotus.miner.report/mainnet_api/0/node/rpc/v0
The Filecoin Data Portal publishes up to date dataset on a daily bases as static Parquet files. You can then use any tool you want to explore and use these datasets! Let's go through some examples.
You can use the pandas
library to read the Parquet files. You
can play with the datasets in Google Colab for free. Check this sample notebook.
You can use the duckdb
Obervable client library to read the Parquet files and run SQL queries on them. Check this
sample Observable JS notebook to see how to explore and visualize the datasets.
Some of the datasets built by the pipelines are also available in Dune. You can use the Dune SQL editor to run queries on these datasets. Check this one on Dune.
The pipelines that are executed to generate the datasets are also pushing the data to Google Sheets. You can access the data directly from these Google Sheets:
You can create a new personal Google Sheet and use the IMPORTRANGE
function to read data from these sheets and be able to plot or add more transformations
on top.
Depending on the BI tool you are using, you can connect to the Parquet files directly, use the Google Sheets as a data source, or you'll need to load the data into a database like PostgreSQL or BigQuery. There are
Filecoin Pulse is a website build with Evidence using the Filecoin Data Portal datasets. You can check the source code on GitHub to see how to use the datasets in Evidence.
Another alternative is to use the Observable framework to create dashboards and visualizations. You can use parquet files as data sources and generate beautiful static websites providing dashboards and reports like Filecoin in Numbers, a dashboard built with Observable Framework on top of the Portal open datasets. You can check the source code on GitHub too.
Do you have any other tool you want to use to explore the datasets? Reach out and let's explore how to use the datasets with your favorite tools!