Decentralized Storage and Publication with IPFS and Swarm

Tonino Jankov
Tonino Jankov
Share

In this article, we outline two of the most prominent solutions for decentralized content publication and storage. These two solutions are IPFS (InterPlanetary File System) and Ethereum’s Swarm.

With the advent of blockchain applications in recent years, the Internet has seen a boom of decentralization. The developer world has suddenly gotten the sense of the green pastures that lie beyond the existing paradigm, based on the server–client model, susceptible to censoring at the whims of different jurisdictions, cloud provider monopolies, etc.

Turkey’s ban of Wikipedia and The “Great Firewall of China” are just some examples. Dependence on internet backbones, hosting companies, cloud providers like Amazon, search providers like Google — things like these have betrayed the initial internet promise of democratization of knowledge and access to information.

As this article on TechCrunch said two years ago, the original idea of the internet was “to build a common neutral network which everyone can participate in equally for the betterment of humanity”. This idea is now reemerging as Web 3.0, a term that now means the decentralized web — an architecture that is censorship proof, and without a single point of failure.

Dapps

As Gavin Wood, one of Ethereum’s founders, in his 2014 seminal work on Web 3.0 put it, there is “increasing need for a zero-trust interaction system”. He named the “post-Snowden web”, and described four components to it: “static content publication, dynamic messages, trustless transactions and an integrated user-interface”.

Decentralized Storage and Publication

Before the advent of cryptocurrency — and the Ethereum platform in particular — we had other projects that aimed to develop distributed applications.

  • Freenet: a peer to peer (p2p) platform created to be censorship resistant — with its distributed data store — was first published in 2000.
  • Gnutella network: enabled peer-to-peer file sharing with its many client incarnations.
  • BitTorrent: was developed and published as early as 2001, and Wikipedia reports that, in 2004, it was “responsible for 25% of all Internet traffic”. The project is still here, and is technically impressive, with new projects copying its aspects — hash-based content addressing, DHT distributed databases, Kademlia lookups …
  • Tribler: as a BitTorrent client, it added some other features for users, such as onion routed p2p communication.

Both of our aforementioned projects build on the shoulders of these giants.

IPFS

IPFS logo

The InterPlanetary File System was developed by Juan Benet, and was first published in 2014. It aims to be a protocol, and a distributed file system, replacing HTTP. It’s a mixture of technologies, and it’s pretty low level — meaning that it leaves a lot to projects or layers built on top of it.

An introduction to the project by Juan Benet from 2015 can be found in this YouTube video.

IPFS aims to offer the infrastructure for reinventing the Internet, which is a huge goal. It uses content addressing — naming and lookup of content by its cryptographic hash, like Git, and like BitTorrent, which we mentioned. This technique enables us to ensure authenticity of content regardless of where it sits, and the implications of this are huge. We can, for example, have the same website hosted in ten, or hundreds of computers around the world — and load it knowing for sure that it’s the original, authentic content just by its hash-based address.

This means that important websites — or websites that may get censored by governments or other actors — don’t depend on any single point, like servers, databases, or even domain registrars. This, further, means that they can’t be easily extinguished.

The Web becomes resistant.

One more consequence of this is that we don’t, as end users, have to depend on internet backbones and perfect connectivity to a remote data center on another continent hosting our website. Countries can get completely cut off, but we can still load the same, authentic content from some machine nearby, still certain of its authenticity. It can be content cached on a PC in our very neighborhood.

With IPFS, it would become difficult, if not impossible, for Turkey to censor Wikipedia, because Wikipedia wouldn’t be relying on certain IP addresses. Authentic Wikipedia could be hosted on hundreds or thousands of local websites within Turkey itself, and this network of websites could be completely dynamic.

IPFS has no single point of failure, and nodes don’t need to trust each other.

Addressing the content is algorithmic — and it becomes uncensorable. It also improves the efficiency. We don’t need to request a website, or video, or music file from a remote server if it’s cached somewhere close to us.

This can eliminate request latency. And anyone who’s ever optimized website speed knows that network latency is a factor.

By using the aforementioned Kademlia algorithm, the network becomes robust, and we don’t rely on domain registrars/nameservers to find content. Lookup is built into the network itself. It can’t be taken down. Some of the major attacks by hackers in recent years were attacks on nameservers. An example is this particular attack in 2016, which took down Spotify, Reddit, NYT and Wired, and many others.

IPFS is being developed by Protocol Labs as an open-source project. On top of it, the company is building an incentivization layer — Filecoin — which has had an initial coin offering in Summer 2017, and has collected around $260 million (if we count pre-ICO VC investment) — perhaps the biggest amount collected by an ICO so far. Filecoin itself is not at production-stage yet, but IPFS is being used by production apps like OpenBazaar. There’s also IPFS integration in the Brave browser, and more is coming …

The production video-sharing platform d.tube is using IPFS for storage, while Steemit is using it for monetization, voting, etc.

It’s a web app that’s waiting for wider adoption, but it’s currently in production stage, and works without ads.

Although IPFS is considered an alpha-stage project, just like Swarm, IPFS is serving real-world projects.

Other notable projects using IPFS are Bloom and Decentraland — an AR game being built on top of the Ethereum blockchain and IPFS. Peerpad is an open-source app built to be used as an example for developers developing on IPFS.

Swarm

Swarm

According to Viktor Tron, of the Ethereum Foundation, “basically, Swarm is BitTorrent on steroids”.

Swarm, by Ethersphere, aims to solve the same problems as IPFS. According to its GitHub page

Swarm is a distributed storage platform and content distribution service, a native base layer service of the Ethereum Web 3 stack. The primary objective of Swarm is to provide a sufficiently decentralized and redundant store of Ethereum’s public record, in particular to store and distribute Đapp code and data as well as block chain data.

Viktor Tron is currently behind Swarm as its lead developer. He was one of the first employees of the Ethereum Foundation. Ethereum Foundation is funding the project development, along the lines of Gavin Wood’s vision of Web 3.0 that we quoted. So, Swarm is more integrated with the Ethereum ecosystem, and along with Whisper and Ethereum Virtual Machine, it’s aiming to build a next-generation platform for distributed apps, or Đapps.

Swarm is in an earlier stage of development than IPFS. To quote Viktor Tron —

IPFS is much further along in code maturity, scaling, adoption, community engagement and interaction with a dedicated developer community.

Once Swarm becomes production-ready, it will provide an incentivization layer and integration with Ethereum’s smart contracts, which should give plenty of room for creativity and innovative applications.

Neither the incentivization layer of Swarm nor of IPFS (Filecoin) are currently ready for use.

POC3

Note: at the time of writing (May 2018), Swarm’s lead developer has announced the release of POC3, which keeps its roadmap on the clock, and gives reasons for optimism regarding Swarm becoming production-ready in 2018.

While IPFS aims to build a protocol, and is a lower-level, more generic project, Swarm ties into the Ethereum’s Web 3 vision, with more focus on censorship resistance: it “implements plausible deniability with implausible accountability through a combination of obfuscation and double masking”.

This reminds us of the Freenet project, where those hosting certain content don’t necessarily have access to it, or know what it is.

Swarm, with its incentivization mechanisms, is aiming to provide higher level solutions. It —

exploits the full capabilities of smart contracts to handle registered nodes with deposit to stake. This allows for punitive measures as deterrents. Swarm provides a scheme to track responsibilities making storers individually accountable for particular content.

Compared to IPFS, Swarm has a lot of focus on these mechanisms. On the one hand, this includes incentives for long-term storage of not-so-popular content, and on the other, incentives for highly popular, high-bandwidth content. These two require two different approaches to penalties/rewards.

In Swarm’s case, this requires working on cryptographic constructs known as Proof-of-Custody, which make it possible “to have a compact proof, proving to any third party that you store a blob of data without transferring the whole data and without revealing the actual contents”. So proving a storage of some content doesn’t require the full download of that content every time.

Swarm even has an Accounting Protocol, SWAP, currently in development, as one level of incentivization.

Currently, before incentivization mechanisms are published, which is expected to happen in 2018, Swarm functions like a cache: less popular content can get deleted, and there’s no insurance against that.

Swarm will be usable as cloud hosting, while IPFS relegates this to projects that will be built on its infrastructure. IPFS leaves it to the implementors/developers to find the actual storage devices.

IPFS itself, as lower layer, has no guarantees of storage. While Swarm includes this in its roadmap, the IPFS team, in comparison, plans this on the Filecoin level, but it’s just in idea stage at the moment.

There’s a two-part YouTube interview with Tron where he explains the Swarm project in less technical terms:

There are two projects that build further on IPFS and Swarm that are worth mentioning in the context of Đapps: distributed applications. Since both projects allow for only a limited level of dynamic content, database-oriented projects built on top of these distributed systems add significant value.

OrbitDB is a “serverless, distributed, peer-to-peer database” that uses IPFS for its data storage.

It’s a database that works both for Node.js and in browsers. Its development is active, and is being sponsored by Protocol Labs. After its $260 million fundraising in 2017, the future of OrbitDB — just like that of IPFS — looks promising.

OrbitDB is part of the Node.js/npm ecosystem.

Wolk is a project/token that’s building a database — SWARMDB — using Swarm’s codebase. Behind it is a Californian startup, Wolk Inc., that managed to raise around 7,100 ETH in its ICO in 2017. WOLK promises a censorship-resistant distributed database powered with WLK token as its incentivizing layer. It provides a Go, Node.js and HTTP interface.

They claim Swarm and Bancor as their partners.

While it’s hard to predict success and adoption of these projects, or ascertain their quality, as IPFS and Swarm progress and become more production-ready and reach wider adoption, it’s pretty certain we’ll see more projects like these.

Swarm’s Orange Paper is an interesting, albeit a very technical read.

A longer comparison of the two projects can be found here.

Commonalities

Things that both IPFS and Swarm share are hash-based content addressing, which we described before. And while this provides git-level version control of the content, hosted on both systems, and censorship-resistance, deleting the content is something that remains to be solved.

Immutability provides guarantees of authentic content, but changes to the content produce new addresses, so to provide editing capability, additional layers are necessary.

From the perspective of different web apps, both projects support only static content. So, there’s no back-end apps with interpreted languages, like PHP, Python, Ruby, or Node.js. For Swarm, this is where EVM comes into play, but EVM also has its own inherent limitations.

Conclusion

Both IPFS and Swarm are promising projects, although one can’t help but wonder if the developers have set overly ambitious goals. If they succeed with their development roadmaps, and achieve wider adoption, there’s no doubt this will bring big changes to the Internet as we know it.

Frequently Asked Questions on IPFS and Swarm

What are the key differences between IPFS and Swarm?

IPFS and Swarm are both decentralized storage systems, but they have some key differences. IPFS, or InterPlanetary File System, is a protocol designed to create a permanent and decentralized method of storing and sharing files. It uses content-addressing to uniquely identify each file in the network. On the other hand, Swarm is a native base layer service of the Ethereum web3 stack. It provides a decentralized and redundant storage of Ethereum’s public record, in particular to store and distribute dapp code and data.

How does IPFS handle data redundancy?

IPFS handles data redundancy through its unique content-addressing feature. When a file is uploaded to the IPFS network, it is split into blocks, and each block is given a unique hash. This hash is used to retrieve the file. If the same file is uploaded by another user, IPFS doesn’t duplicate the file; instead, it just points to the existing file. This way, IPFS ensures data redundancy and efficient storage.

How does Swarm ensure data availability?

Swarm ensures data availability through its network of nodes. When a file is uploaded to Swarm, it is split into chunks and distributed across various nodes in the network. Even if some nodes go offline, the file can still be retrieved from other nodes that hold the chunks. This redundancy ensures that data is always available, even in the event of node failures.

Can IPFS and Swarm work together?

Yes, IPFS and Swarm can work together. Both systems aim to decentralize the web and can complement each other. For instance, IPFS can be used for static file storage and retrieval, while Swarm can be used for dynamic data storage and real-time interactions. By integrating both systems, developers can leverage the strengths of both technologies to build robust decentralized applications.

What are the security features of IPFS and Swarm?

Both IPFS and Swarm have built-in security features. IPFS uses cryptographic hashing to ensure data integrity. Each file and each block within a file is given a unique hash, which serves as its address. This ensures that the data hasn’t been tampered with. Swarm, on the other hand, uses the Ethereum blockchain for its security. It leverages Ethereum’s smart contracts to handle microtransactions for data storage and retrieval, ensuring a secure and trustless system.

How does IPFS handle large files?

IPFS handles large files by splitting them into smaller blocks. Each block is given a unique hash, which is used to retrieve the block. This allows IPFS to handle large files efficiently, as each block can be retrieved independently and in parallel. This also ensures that the same data isn’t stored multiple times, saving storage space.

How does Swarm handle data privacy?

Swarm handles data privacy through its obfuscation feature. When a file is uploaded to Swarm, it is split into chunks and distributed across various nodes. The nodes only know the hash of the chunks they store, not the content or the origin of the chunks. This ensures that the data is private and secure.

What types of applications can be built using IPFS and Swarm?

IPFS and Swarm can be used to build a wide range of applications. These include decentralized websites, peer-to-peer applications, collaborative applications, streaming services, and more. By leveraging the decentralized and redundant storage features of IPFS and Swarm, developers can build applications that are resistant to censorship and data loss.

How does IPFS handle data retrieval?

IPFS handles data retrieval through its Distributed Hash Table (DHT). When a user requests a file, IPFS uses the DHT to find the nodes that hold the blocks of the file. The blocks are then retrieved and assembled to form the original file. This ensures efficient and fast data retrieval.

How does Swarm handle data distribution?

Swarm handles data distribution through its network of nodes. When a file is uploaded to Swarm, it is split into chunks and distributed across various nodes. The nodes then replicate the chunks to other nodes, ensuring that the data is distributed evenly across the network. This ensures data availability and redundancy.