Table of contents
This was originally published as a blog post
Discussions with some very smart people over time (e.g. this discussion on Friendfeed) have convinced me that our problems around open data and data ownership do not necessarily stem from scientists being inherently protective of their data, but rather from a system that encourages them to do so.
First let me throw out some oft-repeated mantras that drive my philosophy, mostly stolen from other wiser people.
- Raw data by itself is not the value center. Value comes from the interpretation of these data.
- Data finds the data (then people find the people) (via Jeff Jonas and Jon Udell)
- Wherever you are, there is someone smarter somewhere else (Via Tim Bray, channeling Bill Joy)
Now that we have got those biases out of the way, let us assume that most people involved in science do care about science in general, and acknowledg that as humans we need recognition in some manner. In such a scenario, the challenge lies not in trying to fit our goals and needs into an existing, broken, system, but rather in taking this system, which is very long in the tooth, and changing it.
The science blogosphere, The BioGang, etc are but a small part of the scientific community. Some of us have the ability to make change from within, some of us have a bigger pulpit than others, and some of us can only write about the changes we would like to see. So it’s going to take a while, but if pharma companies can agree to share pre-competitive biomarker data, then academics can change as well.
I continue to maintain that raw data should be made public in a reasonable time. You might want to re-check the data quality, or perhaps your data was collected to support a hypothesis, and you have full right to test it out. But you can’t sit on that data. Complete your analysis and make it available. And if the data are collected for the sake of data collection (genome study, high throughput structure determination) then you must make it available ASAP. There is enough in there to keep many many people busy.
The other aspect is data ownership. Large data sets of fundamental data belong in the public domain. Supporting data, data that supports a paper, or some hypothesis or discovery, I am not 100% sure about. I think there needs to be some form of attribution, especially if you don’t plan to publish the data in a paper. How do we manage that? I don’t know. Others have studied this for a longer time. How does this protect long term monetization prospects? Actually that’s the easy part, and I’ve written about it many times before.
Sometimes I feel that it’s pointless to write about this subject, one I care about more than most. Then I remember how much I care.

Comments