Designing a distributed AgTech application with IPFS

Designing a distributed AgTech application with IPFS

While building Smart Contracts I realized how some trivial decisions in the context of centralized programming can rapidly become a first-class concern. For instance, where should I store my files so that any involved party can see them and verify their integrity? How to make sure no one can withhold an important document I am interested in? In the past, I tried to avoid the need of reference files altogether, or to store just a hash on the ledger and confide in the kinship among the network participants. These approaches proved to be clearly unsatisfactory, so I finally decided to explore new technologies and put an end to my dismay.

Introducing IPFS

I was in this state of mind when I started dabbling with IPFS and I realized it has lots of potential. So what is IPFS, and how can it help? First of all, IPFS’ quirky full name is Inter-Planetary File System. It is an open-source, peer-to-peer file storage technology that aspires to dethrone HTTP for the Distributed Web. IPFS is essentially a network of cooperating peer nodes that share the ownership of the data uploaded, so that any file can be distributed to any node and later made conveniently accessible to the world, no matter what happens to the original author. On IPFS, a central role is played by the file CID, an unique identifier that allows to recognize and query any content, which incidentally is made from the file’s hash so it serves as an integrity check as well.

In the first place, I stormed the quick start tutorial, ran a IPFS node via the docker image and started querying it by using its HTTPS APIs. Now, every node comes with a useful tutorial directory, that can be explored like a regular UNIX directory by submitting a POST request to the /api/v0/list endpoint (please note: you have to set the query parameter arg to the desired CID. For the quick-start directory it will be QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG). The result of the operation is the following:

Each link represents the unique identifier of some content stored on the IPFS. The type attribute represents directories with a value of 1 and uses 2 for files. As a consequence, if we try to list a file we won’t find any inner link referred, as expected:

The Rise of (Decentralized) Machines

Once satisfied with my achievements, I decided it was time to level up my learning curve by starting a simple project.

However, finding an appropriate IPFS idea was a big first-world problem for me. I currently focus on writing microservices in Kotlin with SpringBoot, the kind of centralized pattern that gets a bad rap in the galaxy of Web3 and thick JS clients. People today have access to extremely versatile and powerful devices, it’s not reasonable to delegate to obscure third-party servers what can be done on a smartphone. Still, this is not true for the entirety of agents that could operate in the DWeb: a UI provided by a browser or a mobile app to us humans is not really useful to cold machines.

This may sound strange to some of you, but a sentence I once read stuck with me:

“The best UX is to have no UI at all

This makes totally sense when thinking it thoroughly. A lot of business processes and operations are automated today, and the fundamental purpose of automation is to create value without the intervention of a real person. As any manager could confirm, the best employee is the one which self-manages. Likewise, a process correctly automated would not require an human interaction, excluding critical decision points and when bugs arise (rarely, we all hope).

What kind of application might that be?

Soil, Crops and Sensors

Let’s imagine our users are farmers in a certain area, collecting granular weather data, information on water consumption or fertilizers productivity. The data could be uploaded on IPFS, relevant participants can be notified of the event and browse the reports without relying on any given third party. On top of that, IPFS offers innate de-duplication which is super helpful for typical sensor data (a more detailed explanation can be found in this article) and thrives on struggling networks and the minimal sensor-ware the farmers may have in their availability.

This becomes even better, as anyone can now offer new on-top services: for instance, the farmers could collectively pay a fee to a provider on Ethereum to run previsions on the weather data, forecast water scarcity and plan activities.

The provider’s application would then need to download the raw sensor data, make sense of it and share a sensible forecast back on IPFS, so that the farmers can act on the new derived knowledge.

Drafting a Quick Solution

Let’s draft a short list of necessary MVP features for our users:

  1. upload sensor data on IPFS
  2. ask to a Provider to start analyzing data in a certain time interval
  3. privately upload some data to share it with certain users

The first point can be achieved easily with IPFS APIs. However, I was initially tempted to put all of the user data in a directory, store the CID, and later list and download the files in it. This looks viable by just playing with /ls and /cat APIs. Naively, I instinctively thought “Great! Users will create as many directories they wish, they will add files, remove them, all with native IPFS! I will finish my prototype in no time!”.

However, navigating the IPFS in this way is misleading. It can’t behave as a UNIX-like filesystem, because it is immutable. As it is possible to read on the documentation, each file hash is calculated from its own content and likewise, each directory’s hash is calculated from its entries hashes in a bottom up fashion. Let’s imagine we need to add a directory to IPFS with the following basic structure:

foo/
├── bar
│  └── baz
└── baz

We can then submit it to IPFS by running $ ipfs add -r foo:

Did you notice that entries are logged from the innermost element? It’s no coincidence, that is the necessary order IPFS uses to create those entries.

The logical conclusion of this mechanism is that it’s not possible to change the directory entries because this would invalidate the previous CID calculation and generate a new hash. Essentially, we would have a brand new directory. This innate versioning system is a nice side effect in IPFS when trying to prevent file tampering. Although, it creates some issues in making sense of your files and organizing them according to domain specific groupings. Not a problem, this just means that our high-tech farmers will rely on a different mechanism. Two alternatives come to mind:

  • using IPNS: a name resolution system based on IPFS. It makes possible to create a named reference of a IPFS directory to a certain key and later edit its content
  • just creating an index of the contents and store it on a DB

I felt that the first approach was unnecessary for my purposes and also a little cumbersome, especially as users are going to generate a large, uninterrupted stream of raw data over time. It would be in theory possible to store the index on some kind distributed DB or even in a blockchain, but since the index is mainly a utility tool intended for private use, I chose to store it on some good old local storage tech.

In the second place, a Provider should be somehow notified about the analysis task and receive an indexed list of files to analyze. Matching farmers and data specialists should not be an especially difficult task and could happen both on a decentralized or a traditional platform: so I just assumed that, after being chosen, the service provider receives a list of CID blocks containing time-series data via a REST API.

Lastly, the data analysis performed is probably of some value and was paid for, hence it is considerate to treat it as a reserved matter: the provider can just encrypt the insights with a farmer’s public key received in the previous step and upload it back to the IPFS. As a consequence, it would be possible to preserve data privacy while preserving accessibility by adding a simple security layer to IPFS.

It Ain’t Much, But It’s Honest Work

Learning more about IPFS by figuring out a possible use case was fun, but I spent quite some time in designing an application remotely useful: imagining solutions for the Distributed Web is still an uncharted territory, at least for me, but I believe it is an extremely rewarding path to follow.

If you are interested in Web3 and AgTech as well, please share your thoughts, I am keen on learning more on these subjects!

Sources