Mining of Published Scientific Research
A Scientific Gold Rush:
Electronic Mining of Published Research
The journal Science publishes an important paper on
harvesting vast amounts of "metaknowledge"
The knowledge of knowledge. The science of science. Riddles? No. A burgeoning and important field of scientific research that examines research itself, say University of Chicago Sociology Assistant Professor James Evans and Post-doctoral Scholar Jacob Foster. Their analysis has been published in a perspective piece in the journal Science.
A scientific approach to delving into the knowledge of knowledge--metaknowledge--offers great potential for new discovery, they argue. New possibilities may arise when one uncovers scientific bias, possible "ghost theories" or acquires an understanding of the context of research, and then accounts for those factors or eliminates them and engages in new research.
"We review the expanding scope of metaknowledge research, which uncovers regularities in scientific claims and infers the beliefs, preferences, research tools and strategies behind those regularities. Metaknowledge research also investigates the effect of knowledge context on content. Teams and collaboration networks, institutional prestige and new technologies all shape the substance and direction of research."
Metaknowledge can be very useful to a variety of disciplines and fields. Evans' and Foster's research, while primarily funded by NSF's Science of Science and Innovation Policy, was co-funded by NSF's Division of Chemistry interested in reviewing developments in Chemistry over time.
Metaknowledge may also be useful in shedding light on shorter term questions. Google used computational content analysis to identify the emergence of influenza outbreaks by identifying and tracking related Google searches. The process was faster than other techniques typically used by health officials.
"Collaboration is revealed to be much more important to the future of science policy," explains Julia Lane, director of NSF's Science of Science Innovation Policy program (the other co-funder of this research). "As the perspective so aptly put, 'the rise in scientific review articles and the concomitant explosion of scientific publications over the past century trace a growing supply and demand for the focused assessment and synthesis of research claims. As the number of analyses investigating a particular claim has become unmanageable ... researchers have increasingly engaged in meta-analysis-counting, weighting and statistically analyzing the census of published findings on the topic.'"
According to the perspective's authors, metaknowledge sheds light on the role funding plays in science. "There is evidence from the metaknowledge that embedding research in the private or public sector modulates its path," Evans and Foster write. "Company projects tend to eschew dogma in an important hunt for commercial breakthroughs, leading to rapid but unsystematic accumulation of knowledge, whereas public research focuses on the careful accumulation of consistent results."
A promise of metaknowledge, they argue, is also its capacity to steer researchers into new fields, "Metaknowledge could inform individual strategies about research investment, pointing out overgrazed fields where herding leads to diminishing returns as well as lush range where premature certainty has halted promising investigation.
The ability of metaknowledge researchers to see connections and uncover previously missed aspects of research is powered, in part, by the growth of natural language processing (NLP), one of the rapidly emerging fields of artificial intelligence, largely supported by the NSF's Directorate on Computer and Information Science and Engineering.
NLP enables massive amounts of information, the details of fantastic discoveries and vast quantities of research funded by NSF and other organizations, to be electronically mined. Then machines can read, extract information from, and summarize enormous amounts of data.
"Extraordinary advances in computational abilities enable social scientists to further delve into the data in order that we may understand the sweep of science," said Lane, "the context, social networks, physical and institutional settings--the many factors that shape the findings themselves."
February 10, 2011
- Media Contacts
Lisa-Joy Zgorski, NSF (703) 292-8311 firstname.lastname@example.org
William Harms, University of Chicago (773) 702-8356 email@example.com
- Program Contacts
Julia I. Lane, NSF (703) 292-5145 firstname.lastname@example.org
- Powerful new ways to electronically mine published research may lead to new scientific breakthroughs: http://news.uchicago.edu/news.php?asset_id=2249
- NSF's Science of Science and Innovation Policy program: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=501084
- Examples of natural language processing awards: http://www.nsf.gov/pubs/stis1993/nsf93133/nsf93133.txt