Who cares about data provenance?

Who cares about data provenance?

Without context, there’s no meaning

To better understand a person, it’s often helpful to understand their context. What language do they speak? What city are they from? How old are they? Which schools did they attend and what did they study?

Data is no different. If you don’t know where your data came from, who created it, when it was made, and which dataset it belongs to, it can be premature or even misguided to draw meaningful conclusions.

However, tracking and communicating this source information can be very difficult, particularly in typical environments where data producers and consumers have unaligned incentives, as is often the case.

A few financial examples

One area we’re focused on given our prior experience is financial data. It’s very easy to make a profitable financial strategy with the benefit of hindsight. Thus, knowing when financial data was created and who created it is almost as important as the observations themselves.

If you’re constructing a financial index, tracking a portfolio, making financial predictions, or generating data that may be useful for predictive analysis, recording the provenance of this information will likely make your data more trustworthy and valuable.

Introducing vBase

vBase is a cheap, scalable, and robust means of assuring the provenance of data records and digital objects, and of communicating it to others.

If you work with time-sensitive financial data or know people who do, please get in touch with us at hello@vbase.com or just start using our app or simple SDKs. We’d love to hear from you.

vBase Blog

Recent Posts

Beyond RFC 3161: The Failures of Legacy Timestamping and a Solution Beyond RFC 3161: The Failures of Legacy Timestamping and a Solution

RFC-3161 timestamps often fall short in a number of important use cases. We examine the problems and a solution.

Dan Averbukh
4 reasons people don’t trust your backtest 4 reasons people don’t trust your backtest

Some analysts spend months building backtests that no-one is willing to trust. Learn why, and what to do about it.

Dan Averbukh
3 reasons why GitHub timestamps shouldn’t be trusted 3 reasons why GitHub timestamps shouldn’t be trusted

GitHub timestamps can be trivially altered and should not be trusted for recording the provenance of code or data. Proceed with caution.

Dan Averbukh