Thursday, September 12, 2013

INCF 2013 - Using experimental design to design neuroinformatics data structures

Video Link:

The interdisciplinary nature of neuroscience research leads to an explosion of different informatics tools, data structures, platforms and terminologies. A central difficulty faced by developers is that knowledge representations for any neuroscience subdomain must serve the domain-specific needs of that specified sub-community. Related representations overlap, they contradict each other, they have competing standards. The process of standardization is itself difficult to organize within the community and even harder to enforce in practice. This involves complex issues involving ease of use, computability, data availability as well as scientific correctness and philosophical purity.

In this talk, I present a novel, relatively simple conceptual design that makes a clear distinction between interpretive and observation knowledge to build a general framework for scientific data. Our methodology (called 'Knowledge Engineering from Experimental Design' or KEfED)  uses an experiment's protocol's to define the dependencies between its independent and dependent variables. These dependencies support the construction of a data structure that can capture (a) data points, (b) mean values, (c) statistical significance relations and (d) correlations. We will describe the underlying formalism of the KEfED approach, the tools we provide to help researchers build their own models, our approach to unify and standardize the definition of variables, the application of KEfED to complex neuroscience knowledge and possible research directions for this technology in the future.

Sunday, August 25, 2013

The SciKnowMine Project: Bridging BioNLP and Biocuration

Biological Natural Language Processing ('BioNLP') holds great promise to support and accelerate biocuration (organizing published biomedical knowledge into online resources such as databases) but has not yet generated viable open technology for use within the community. This is an area of active research and is the subject of shared evaluations such as 'BioCreative 4'. As the closing meeting of an NSF-funded infrastructure project (called 'SciKnowMine', #0849977), we held a workshop to (A) present an implementation of a system for document triage that we are currently deploying to the Mouse Genome Informatics (MGI) system, (B) present and develop a strategic plan for open-source community-driven tools that bridge between curators committed to improving the quality of their informatics resources and computer science specialists developing novel NLP technology. The meeting was well-attended by many experts from both communities and in-keeping with the vision of this blog of examining the issues inherent in developing scientific breakthroughs by explicitly describing the paradigms that different disciplines inhabit, the workshop was fully designed around the theme of finding connecting points between these two inter-dependent paradigms.

The workshop page is here: 

And my introduction and talk (which goes into some detail about the way we use paradigms) is here:

and here: 

Monday, May 20, 2013

ISI AI Seminar: Introducing paradigms as a viable structural guide for biomedical knowledge engineering

I gave a slightly more technical version of the talk presented at Google for my colleagues at ISI to describe the latest work going on in my group.

Following Thomas Kuhn's seminal 1962 book in which he introduced the notion of scientific paradigms, we here describe a computational methodology that leverages that concept in a concrete formulation. I describe this approach partially as a methodology for framing and scoping the knowledge representation and analysis work necessary to build tools to serve a specific community. However this approach also has technical implications that are relevant to semantic web representations, the use of workflows and reasoning and the way that we derive content from existing scientific artefacts. We will explore this viewpoint in the context of a well defined domain problem (Biomarker studies of neurodegenerative diseases) with the strategic intent of developing a practical, scoped view of biomarker data that could serve as the basis of corollary work within AI computer science groups.

Tuesday, May 7, 2013

Google Tech Talk: Organizing the world’s scientific knowledge to make it universally accessible and powerful: building the breakthrough machine.

I gave a talk at Google Los Angeles on 04/30/2013 about my work. Here is a video link and a short abstract about the presentation.

YouTube Link:

Abstract: Not all information is created equal. Accurate, innovative scientific knowledge generally has an enormous impact on humanity. It is the source of our ability to make predictions about our environment. It is the source of new technology (with all its attendent consequences, both positive and negative). It is also a continuous source of wonder and fascination. In general, the value and power of scientific knowledge is not reflected in the scale and structure of the information infrastructure used to house, store and share this knowledge. Many scientists use spreadsheets as the most sophisticated data management tool and only publish their data as PDF files in the literature. In this high-level talk, we describe a powerful, new knowledge engineering framework for describing scientific observations within a broader strategic model of the scientific process. We describe general open-source tools for scientists to model and manage their data in an attempt to accelerate discovery. Using examples focussed on the high-value challenge problem: finding a cure for Parkinson's Disease, we present a high-level strategic approach that is both in-keeping with Google's vision and values and could also provide a viable new research that would benefit from Google's massively scalable technology. Ultimately, we present an informatics research initiative for the 21st century: 'Building a Breakthrough Machine".

Thursday, April 11, 2013

Scientific Paradigms: Finding the Tears in the Curtain

In his excellent book, 'Where Good Ideas Come From', Steven Johnson uses the phrase ‘the adjacent possible’ to denote how some ideas are just too innovative, too off-the-wall, too ahead-of-their-time to be successful. The classic example of this is Babbage's Difference Engine. This was the earliest invention of an algorithmic computer, but because it was built in Victorian times, before the existence of electronics, it never stood a chance of actually being used. It was brilliant, innovative, remarkable, but it also was impractical and would never work. It lay too far beyond what was possible at the time it was created. We had to wait for transistors to be invented before the invention of practical computing machinery became possible. Steven Johnson reminds us that we can't look too far afield for discovery. We have to look just over the brow of the next hill (not behind the looming mountain in the distance). 

The word 'paradigm' was presented by Thomas Kuhn to denote 'coherent traditions of scientific work’ made up of laws, theories, applications and instrumentation that reflect a way of thinking about a specific domain of knowledge. Kuhn describes paradigms as largely remain static and stable, gradually expanding the boundaries of knowledge at their edge. But, when scientific explanations and predictions don't match our observations about what is happening in the real world, a wonderful schism occurs. Theories break down. Scientists tear their hair out in frustration. Nothing seems to make sense until finally, the domain's theory, explanation and practices have to change. It's this occasional disconnect between interpretation and observation that powerfully churns the creative process of scientific work to sometimes trigger ‘paradigm shifts’. Under normal processes within a paradigm, the area of investigation available to us is incremental and predictable. We see the adjacent possible with no mystery. Under the disrupted conditions of a paradigm shift, we don't know where we might end up. The boundaries of the adjacent possible expand in an abrupt, disruptive, non-linear and unpredictable way.  

Understanding and harnessing the underlying dynamic of this sudden fracturing, and restructuring of a body of knowledge under a paradigm shift would have to lie at the central heart of the inner workings of a breakthrough machine. This means that perhaps the central construct of our representation of scientific knowledge should be a paradigm itself. This is not presently the case. The most common prevailing view of way for bioinformatics researchers to define knowledge are based on attempts to define large-scale logical schema (called ‘ontologies’, a word derived from the name for the philosophical study of existence itself) that are intended to define universals rather than scoped, domain-specific assertions limited to describe a locally defined phenomenon.

I feel that we should adjust our knowledge representation to focus on paradigms. Like an expert scientist in a given field, our technology must analyze our existing knowledge so that we can ask important questions that that can be tested experimentally. To be able to do this, we have to focus on details that are directly in front of us: a cancer specialist does not take into account remote astrophysical knowledge of distant galaxies when attempting to find binding sites for her drugs to bind to; a geologist attempting to predict when an earthquake will occur probably does not use information about weather patterns in his calculations (although you never know, he might). It is important to scope the way our knowledge engineering and management technology represents the boundaries that frame the way that we ask questions effectively and we don’t currently have a good methodology for this.

Thus, a interesting thread to work on in scientific knowledge engineering is just simply to ask "How should we represent and process paradigms within informatics systems?". When paradigms duel for supremacy in important fields, epic battles are fought and great careers are either made or destroyed. We might also ask "How do we know when experimental evidence stands between two battered and bruised paradigms and declares one of them the winner?". In particular, probably the the most important and interesting research question we should think about is: "How can we recognize when paradigms fail to provide a good model of reality, tempting us with the scent of a possible underlying breakthrough to be made?"

These are the tiny tears in the curtain that we need to latch onto and pull on with all our might to reveal the truth that lurks hidden beneath. This is where the magic happens and I feel that Kuhn's brilliant notion of scientific paradigms and paradigm shifts could provide us with a powerful unifying concept to provide the underlying blueprint of a breakthrough machine. 

Monday, March 25, 2013

Building a Breakthrough Machine: a Historical Introduction

If we go back one thousand years, to 1013, and ask the question, Why is the world different now from how it was then?” then we might cite military campaigns, political revolutions, evolving economic models or advances in technology. We might highlight the influence of key individuals: explorers, inventors, thinkers, industrialists and scientists. The common factor that underlies the world’s progression from medieval life in 1013 to life in the modern world is discovery.

Scientific breakthroughs are a disproportionately impactful subtype of discovery. Our understanding of phenomena like chemistry, molecules, DNA, vaccines, neurons, electricity, electromagnetism, the movement of planets, quantum mechanics, global warming or even cryptography were founded on ‘eureka’ moments. Scientists synthesize how they think phenomena work into concrete experimental or theoretical investigations that gave new predictive insight into what was actually going on. Developing breakthroughs of this kind require a combination of creativity and careful, rigorous scientific work that are the hallmark of exceptional talented practitioners. As yet, the only effective way to develop scientific discoveries is to find and train exceptional people, to provide them with laboratories and analysis tools, to foster their intercommunication and competition and try to pick out significant findings as they occur.

On Dec 23rd 1999, the Economist magazine published a review of the most important inventions of the last thousand years. Notably, they chose Gutenberg’s printing press as the most influential invention over that period. This machine allowed the generation, reproduction and dissemination of knowledge on an industrial scale. It toppled governments. It educated the masses. It revolutionized trade. Similarly, the invention and rapid development of electronic information technology over the last 70 years provides us with an astonishing array of computational tools that that have so far transformed the way we work, live, socialize, think and play. This technology allows us to use information easily and powerfully in ways hitherto unimagined a few short years ago.

Although informatics tools certainly facilitates science, the act of discovery itself still remains somewhat ephemeral and mysterious. The underlying synthesis of knowledge required to execute such discoveries remains hidden in the minds of a small number of experts. Perhaps we can develop knowledge engineering methodologies and techniques to understand, reproduce and automate the information-driven processes of scientific understanding and discovery. Perhaps, we can build machines that make advanced scientific reasoning as easy for people as reading and writing had become in the years after Gutenberg’s printing press.

In this blog, I will ask the question, “What will it take to develop a breakthrough machine?” and attempt to come up with some answers based on my group’s efforts as well as reviewing other work being done in the field. This blog is also intended to act as a challenge and an invitation to the community to discuss, argue and contribute to the discussion.