Editor’s note: This guest post was written by Richard Price, founder and CEO of Academia.edu — a site that serves as a platform for academics to share their research papers and to interact with each other.
Almost every technological and medical innovation in the world has its roots in a scientific paper. Science drives much of the world’s innovation. The faster science moves, the faster the world moves.
Progress in science right now is being held back by two key inefficiencies:
- The time-lag problem: there is a time-lag of, on average, 12 months between finishing a paper, and it being published.
- The single mode of publication problem: scientists share their ideas only via one format, the scientific paper, and don’t take advantage of the full range of media that the web makes possible.
The stakes are high. If these inefficiencies can be removed, science would accelerate tremendously. A faster science would lead to faster innovation in medicine and technology. Cancer could be cured 2-3 years sooner than it otherwise would be, which would save millions of lives.
The time-lag problem
The first major inefficiency is the time-lag problem for distributing scientific ideas. After you have written a scientific paper, it takes, on average, 12 months for the paper to be distributed to the global scientific community. During that time the paper is going through the peer review process, which takes an extremely long time.
If you read a paper, and have some thoughts about it, and write up a response, it is going to take 12 months for your response to be seen by the global scientific community.
Science is fundamentally a conversation between scientists around the world. Currently the intervals between iterations of that conversation are 12 months on average. This 12 month time-lag represents a huge amount of friction in the circulation of scientific ideas.
Imagine the slowdown on the web if every blog post, and every tweet, and every photo, was made available on the web 12 months after it was originally posted. Imagine if all the stories in your Facebook News Feed were 12 months old. People would be storming the steps of Congress, demanding change.
The time-lag in the distribution of scientific ideas is significantly holding back science. It’s critical for global progress that we work to remove this inefficiency.
The single mode of publication problem
Historically, if a scientist wants to make a contribution to the scientific body of knowledge, it has to be in the form of a scientific paper.
Blogging hasn’t taken off in science, because scientists don’t get credit for writing blog posts. You often hear a scientist saying ‘I’m not going to put these ideas in a blog post, because they are good enough for me to incorporate into a paper, which I’ll publish in the next couple of years’. Everyone loses out because of that delay of a couple of years.
Most people who share information on the web have taken advantage of the rich media that the web provides. People share information in all kinds of forms: videos, status updates, blog posts, blog comments, data sets, interactive graphs, and other forms.
By contrast, if a scientist wants to share some information on a protein that they are working on, they have to write a paper with a set of two dimensional black and white images of that protein. The norms don’t encourage the sharing of an interactive, full-color, 3 dimensional model of the protein, even if that would be a more suitable media format for the kind of knowledge that is being shared.
The future of science: instant distribution
Tim Berners-Lee invented the web in order to make it easier for him and his colleagues to share their research papers. The web has impacted science, but over the next few years, the web is going to entirely re-invent the way that scientists interact.
In 5-10 years’ time, the way scientists will communicate will be unrecognizable from the way that they have been communicating for the last 400 years, when the first academic journal was founded.
The first change will be instant distribution for all scientific ideas. Some sites, such as arXiv, Academia.edu, Mendeley, and ResearchGate have brought instant distribution to certain sub-fields of science recently, and this trend is going to continue to all fields of science.
In a few years, scientists will look back and will struggle to believe that they used to exist in a world where it took 12 months to circulate a scientific idea around the world. Discussing the idea of 12 month distribution delays for ideas will produce the same confused look that it produces today, when one asks someone to conceive of 12 month distribution delays to tweets, blog posts, and general web content.
Instant distribution means bringing the time-lag for distributing a scientific paper around the world down to 1 day, or less. This speed-up will have a transformative effect on the rate of scientific progress in the world. Discoveries will be made much more quickly.
One of the reasons that technological progress in the 20th century was so much greater than growth in previous centuries is that there were so many powerful communication technologies invented in the 20th century that connected people around the globe: the telephone, the TV, the internet.
Bringing instant distribution to science will have a similarly transformative effect on scientific progress.
The future of science: rich media
Historically scientists have written their papers as native desktop content. They have saved their papers as PDFs, and uploaded the files to the web.
Over the next few years, scientific content will increasingly become native web content, and be written natively for the web. Scientific content will be created with the full interactivity, and richness, of the web in mind. Most papers are downloaded from the web, and printed out by scientists for reading. The content was written in such a way that it’s fully readable in print-out form.
Most web content is inherently rich. No-one prints out their Twitter and Facebook News Feeds to read them, or blog posts. The idea of printing out content doesn’t make sense for much of the web’s content, such as YouTube videos, Facebook photos, interactive maps, and interactive graphs such as those on you find on Quantcast, or Yahoo Finance.
The hyperlink itself is a piece of interactivity built into web content. One reason you don’t want to print out a Wikipedia article to read it is that the page is full of useful links, and you want to be adjacent to that interactivity when reading the article to take advantage of the full power of the article.
Historically, scientific papers have cited other papers, but those citations are not hyper-linked.
To citizens of the web, the idea of referring to some other page without linking to it seems an impossibly old-fashioned way of sharing content.
Imagine reading a blog, or a Facebook News Feed, where there were no links, and everything was plain text. Instead, there was a set of references at the end of the page, and those references told you were to find certain other pages on the web, but the references weren’t themselves hyperlinked. A citation to a video would something like “YouTube.com, Comedy section, page 10, “Coke bottle exploding”, video id = 34883”. You would then have to go to YouTube and navigate to the right section to get the video that has that title.
This experience would indeed be a nightmare. The difference between that, and how the web currently is, is the difference between where scientific communication is right now, and where it will be in a few years, when scientists fully adopt the rich media of the web.
Scientists will share content in whatever format makes sense for the piece of content in question. They will share ideas in the form of data sets, videos, 3-d models, software programs, graphs, blog posts, status updates, and comments on all these rich media.
The ways that these content formats will connect with each other will be via the hyperlink, and not via the citation. The citation will look like an ancient concept in a few years.
Science is undergoing one of the most exciting changes in its history. It is in a transition period between a pre-web form of communication to a natively web form of communication. The full adoption of the web by scientists will transform science. Scientists will start to interact and communicate in wonderful new ways that will have an enormous effect on scientific progress.
The future of science: peer review
In a world of instant distribution, what happens to peer review? Will this be a world where junk gets published, and no-one will be able to tell whether a particular piece of content is good or bad?
I wrote a post on TechCrunch a few weeks ago called “The Future of Peer Review”, arguing that the web has an instant distribution model, and has thrived. I argued that the web’s main discovery engines for content on the web, namely search engines, and social networks, are at their heart, evolved peer review systems.
These web-scale peer review systems, search engines and social networks, already drive most discovery of scientific content.
The future of science: academic credit
Historically scientists have gained credit by publishing in prestigious journals. Hiring committees, and grant committees, historically have looked at the kinds of journals a scientist has managed to get published in as a measure of the quality of the scientist’s work. In the last few years, such committees have also started to look at citation counts too.
As scientific content moves to become native web content, scientific content will increasingly be evaluated according to the kinds of metrics that reflect the success of a piece of content on the web.
Web metrics vary, and evolve. Some are internet-wide metrics, such as unique visitors, page views, time on site. Others are specific to certain verticals, or sites, such as Twitter follower counts, StackOverflow score, Facebook likes, and YouTube video views.
As these metrics are increasingly understood in the context of scientific content, scientists will increasingly share content that attracts this kind of credit.
If you can share a data-set, and collect credit for it, you will. If you can comment on a paper, and collect credit for it, you will do that too. If sharing a video of a process is more compelling than having black and white images of the process, videos will take off.
Directing Silicon Valley’s resources towards accelerating science
Science is in the process of being re-built and transformed. It is going to be an exhilarating process. The positive impact to society will be significant.
The next wave of science is not being built by scientific publishers. It is being built by engineering-focused, Silicon Valley tech companies. It is being built by talented and visionary engineering and product teams.
Silicon Valley’s formidable resources are starting to turn in the direction of science, having been focused for the past 2-3 years on areas like optimizing strawberry credit flows on FarmVille. Venture capital, entrepreneurial talent, and engineering talent is starting to flow into the space, and the future of science is starting to be built.
The ecosystem needs more resources. It needs more engineers, entrepreneurs, and venture capital. The prizes for success in transforming science go to everyone in the world. $1 trillion a year gets spent on R&D, of which $200 billion is spent in the academic sector, and $800 billion in the private sector. There are vast new companies waiting to be built here.
As the extraordinary Silicon Valley innovation engine increasingly directs itself at transforming science, you can expect to see acceleration on a scale that science has never seen. Science will change beyond recognition, and the positive impact on the rate of technology growth in the world will be enormous.
The time to act is now. If you are a VC, invest in science startups. If you are an entrepreneur, hunt for an idea in the space and run with it. If you are an engineer or designer, there is a list of startups trying to accelerate science here.
SPECULATIONS ON THE FUTURE OF SCIENCE
(KEVIN KELLY:) Science will continue to surprise us with what it discovers and creates; then it will astound us by devising new methods to surprises us. At the core of science's self-modification is technology. New tools enable new structures of knowledge and new ways of discovery. The achievement of science is to know new things; the evolution of science is to know them in new ways. What evolves is less the body of what we know and more the nature of our knowing.
Technology is, in its essence, new ways of thinking. The most powerful type of technology, sometimes called enabling technology, is a thought incarnate which enables new knowledge to find and develop news ways to know. This kind of recursive bootstrapping is how science evolves. As in every type of knowledge, it accrues layers of self-reference to its former state.
New informational organizations are layered upon the old without displacement, just as in biological evolution. Our brains are good examples. We retain reptilian reflexes deep in our minds (fight or flight) while the more complex structuring of knowledge (how to do statistics) is layered over those primitive networks. In the same way, older methods of knowing (older scientific methods) are not jettisoned; they are simply subsumed by new levels of order and complexity. But the new tools of observation and measurement, and the new technologies of knowing, will alter the character of science, even while it retains the old methods.
I'm willing to bet the scientific method 400 years from now will differ from today's understanding of science more than today's science method differs from the proto-science used 400 years ago. A sensible forecast of technological innovations in the next 400 years is beyond our imaginations (or at least mine), but we can fruitfully envision technological changes that might occur in the next 50 years.
Based on the suggestions of the observers above, and my own active imagination, I offer the following as possible near-term advances in the evolution of the scientific method.
Compiled Negative Results — Negative results are saved, shared, compiled and analyzed, instead of being dumped. Positive results may increase their credibility when linked to negative results. We already have hints of this in the recent decision of biochemical journals to require investigators to register early phase 1 clinical trials. Usually phase 1 trials of a drug end in failure and their negative results are not reported. As a public heath measure, these negative results should be shared. Major journals have pledged not to publish the findings of phase 3 trials if their earlier phase 1 results had not been reported, whether negative or not.
Triple Blind Experiments – In a double blind experiment neither researcher nor subject are aware of the controls, but both are aware of the experiment. In a triple blind experiment all participants are blind to the controls and to the very fact of the experiment itself. The way of science depends on cheap non-invasive sensor running continuously for years generating immense streams of data. While ordinary life continues for the subjects, massive amounts of constant data about their lifestyles are drawn and archived. Out of this huge database, specific controls, measurements and variables can be "isolated" afterwards. For instance, the vital signs and lifestyle metrics of a hundred thousand people might be recorded in dozens of different ways for 20-years, and then later analysis could find certain variables (smoking habits, heart conditions) and certain ways of measuring that would permit the entire 20 years to be viewed as an experiment – one that no one knew was even going on at the time. This post-hoc analysis depends on pattern recognition abilities of supercomputers. It removes one more variable (knowledge of experiment) and permits greater freedom in devising experiments from the indiscriminate data.
Combinatorial Sweep Exploration – Much of the unknown can be explored by systematically creating random varieties of it at a large scale. You can explore the composition of ceramics (or thin films, or rare-earth conductors) by creating all possible types of ceramic (or thin films, or rare-earth conductors), and then testing them in their millions. You can explore certain realms of proteins by generating all possible variations of that type of protein and they seeing if they bind to a desired disease-specific site. You can discover new algorithms by automatically generating all possible programs and then running them against the desired problem. Indeed all possible Xs of almost any sort can be summoned and examined as a way to study X. None of this combinatorial exploration was even thinkable before robotics and computers; now both of these technologies permit this brute force style of science. The parameters of the emergent "library" of possibilities yielded by the sweep become the experiment. With sufficient computational power, together with a pool of proper primitive parts, vast territories unknown to science can be probed in this manner.
Evolutionary Search – A combinatorial exploration can be taken even further. If new libraries of variations can be derived from the best of a previous generation of good results, it is possible to evolve solutions. The best results are mutated and bred toward better results. The best testing protein is mutated randomly in thousands of way, and the best of that bunch kept and mutated further, until a lineage of proteins, each one more suited to the task than its ancestors, finally leads to one that works perfectly. This method can be applied to computer programs and even to the generation of better hypothesis.
Multiple Hypothesis Matrix – Instead of proposing a series of single hypothesis, in which each hypothesis is falsified and discarded until one theory finally passes and is verified, a matrix of many hypothesis scenarios are proposed and managed simultaneously. An experiment travels through the matrix of multiple hypothesis, some of which are partially right and partially wrong. Veracity is statistical; more than one thesis is permitted to stand with partial results. Just as data were assigned a margin of error, so too will hypothesis. An explanation may be stated as: 20% is explained by this theory, 35% by this theory, and 65% by this theory. A matrix also permits experiments with more variables and more complexity than before.
Pattern Augmentation – Pattern-seeking software which recognizes a pattern in noisy results. In large bodies of information with many variables, algorithmic discovery of patterns will become necessary and common. These exist in specialized niches of knowledge (such particle smashing) but more general rules and general-purpose pattern engines will enable pattern-seeking tools to become part of all data treatment.
Adaptive Real Time Experiments – Results evaluated, and large-scale experiments modified in real time. What we have now is primarily batch-mode science. Traditionally, the experiment starts, the results are collected, and then conclusions reached. After a pause the next experiment is designed in response, and then launched. In adaptive experiments, the analysis happens in parallel with collection, and the intent and design of the test is shifted on the fly. Some medical tests are already stopped or re-evaluated on the basis of early findings; this method would extend that method to other realms. Proper methods would be needed to keep the adaptive experiment objective.
AI Proofs – Artificial intelligence will derive and check the logic of an experiment. Ever more sophisticated and complicated science experiments become ever more difficult to judge. Artificial expert systems will at first evaluate the scientific logic of a paper to ensure the architecture of the argument is valid. It will also ensure it publishes the required types of data. This "proof review" will augment the peer-review of editors and reviewers. Over time, as the protocols for an AI check became standard, AI can score papers and proposals for experiments for certain consistencies and structure. This metric can then be used to categorize experiments, to suggest improvements and further research, and to facilitate comparisons and meta-analysis. A better way to inspect, measure and grade the structure of experiments would also help develop better kinds of experiments.
Wiki-Science – The average number of authors per paper continues to rise. With massive collaborations, the numbers will boom. Experiments involving thousands of investigators collaborating on a "paper" will commonplace. The paper is ongoing, and never finished. It becomes a trail of edits and experiments posted in real time — an ever evolving "document." Contributions are not assigned. Tools for tracking credit and contributions will be vital. Responsibilities for errors will be hard to pin down. Wiki-science will often be the first word on a new area. Some researchers will specialize in refining ideas first proposed by wiki-science.
Defined Benefit Funding — Ordinarily science is funded by the experiment (results not guaranteed) or by the investigator (nothing guaranteed). The use of prize money for particular scientific achievements will play greater roles. A goal is defined, funding secured for the first to reach it, and the contest opened to all. The Turing Test prize awarded to the first computer to pass the Turing Test as a passable intelligence. Defined Benefit Funding can also be combined with prediction markets, which set up a marketplace of bets on possible innovations. The bet winnings can encourage funding of specific technologies.
Zillionics – Ubiquitous always-on sensors in bodies and environment will transform medical, environmental, and space sciences. Unrelenting rivers of sensory data will flow day and night from zillions of sources. The exploding number of new, cheap, wireless, and novel sensing tools will require new types of programs to distill, index and archive this ocean of data, as well as to find meaningful signals in it. The field of "zillionics" — - dealing with zillions of data flows — - will be essential in health, natural sciences, and astronomy. This trend will require further innovations in statistics, math, visualizations, and computer science. More is different. Zillionics requires a new scientific perspective in terms of permissible errors, numbers of unknowns, probable causes, repeatability, and significant signals.
Deep Simulations – As our knowledge of complex systems advances, we can construct more complex simulations of them. Both the success and failures of these simulations will help us to acquire more knowledge of the systems. Developing a robust simulation will become a fundamental part of science in every field. Indeed the science of making viable simulations will become its own specialty, with a set of best practices, and an emerging theory of simulations. And just as we now expect a hypothesis to be subjected to the discipline of being stated in mathematical equations, in the future we will expect all hypothesis to be exercised in a simulation. There will also be the craft of taking things known only in simulation and testing them in other simulations—sort of a simulation of a simulation.
Hyper-analysis Mapping – Just as meta-analysis gathered diverse experiments on one subject and integrated their (sometimes contradictory) results into a large meta-view, hyper-analysis creates an extremely large-scale view by pulling together meta-analysis. The cross-links of references, assumptions, evidence and results are unraveled by computation, and then reviewed at a larger scale which may include data and studies adjacent but not core to the subject. Hyper-mapping tallies not only what is known in a particular wide field, but also emphasizes unknowns and contradictions based on what is known outside that field. It is used to integrate a meta-analysis with other meta-results, and to spotlight "white spaces" where additional research would be most productive.
Return of the Subjective – Science came into its own when it managed to refuse the subjective and embrace the objective. The repeatability of an experiment by another, perhaps less enthusiastic, observer was instrumental in keeping science rational. But as science plunges into the outer limits of scale – at the largest and smallest ends – and confronts the weirdness of the fundamental principles of matter/energy/information such as that inherent in quantum effects, it may not be able to ignore the role of observer. Existence seems to be a paradox of self-causality, and any science exploring the origins of existence will eventually have to embrace the subjective, without become irrational. The tools for managing paradox are still undeveloped.