Title: Collaborative Research: Interactive and intelligent searching of biological images by query and network navigation with learning capabilities
Supported by: US National Science Foundation Award Number 0808632.
Problem: A fundamental and hard question in biology is identification: given a biological sample,
we want to identify its closest match among a set of known samples. In this proposal, we focus on
nematodes, which are a very important group of animals. Despite their importance, nematode
research has not reached the desired level of maturity. A key problem is that nematodes are
particularly difficult to identify: the average identification currently takes approximately 2 days, and
a researcher needs 3-5 years of effort to obtain “fluency”. The major limiting factors are: (a) the lack
of tools and automation, (b) the need for book-based image comparison, and (c) the need for
expertise in order to use the existing resources. The few existing databases and tools are more geared
towards seasoned researchers and are of limited use to novice or high school students. We identify
three distinct but related scenarios pertaining to the problem of identification:
Scenario 1 –searching: How can I find the closest “image” among a set to a new digitized "image"?
Scenario 2 - mining: Given a set of "images", can I identify structure and patterns?
Scenario 3 - browsing: How can I help the user navigate the data set effectively to find the
sample that matches better the "image" in his/her microscope?
Previous work: There is a vast literature in image searching by example (an area where the PIs
have major contributions), which could partly answer scenarios 1 and 2. However, the unique
features of our target images make the adoption of previous techniques non trivial. First, we use the
term "image" to denote a set of images that scan the body of the object of interest (think MRI
images). Second, our images are typically low-contrast and mostly monochrome. Third, the least
pronounced features are often the most important for the identification of a species. Fourth, similar
looking features occur in unrelated species and can easily cause mistakes.
Vision: Our vision is to make nematode identification a simple process of point-and-click. Our goals
are to make it easy for anyone to query, and identify species. Our goal is to: (a) make research easier,
(b) make biology popular by making it accessible, and (c) harness and integrate both the data and the
experience of experts, students, and amateurs into a searchable interactive high-quality nematode
Intellectual Merit: We propose to develop a computer-assisted interactive navigation process,
which intelligently assists and learns from the user. We propose to develop novel and customize
existing techniques, and to integrate capabilities into a comprehensive framework. We propose to
use nematodes, which are particularly challenging to work with as mentioned above. However, our
work can extend to many other biological databases. The main research challenges are the extraction
of features and similarity functions for nematodes, and the mining, clustering and anomaly detection
for image and non-image data.
The novelty of our approach is twofold. First, we step away from a tree-based classification approach
in order to make the identification process more robust to human errors. We propose to evaluate and
use methods using richly-connected networks representing the similarity of samples, and
multidimensional feature space to identify discriminating features. Second, our approach enables us
to operate in a human-centric way in a world of imperfections and subjectivity: it is designed to
handle vague "looks like this" decisions and at the same time leverage humans' superior visual
cognition compared to machines.
Broader Impact: The proposed work can revolutionize nematode research by reducing the
identification time by an order of magnitude (from days to hours). Nematodes have direct and
significant effect on humans, animals, and agriculture. For example, four species of nematode
parasites infect over 2 billion people worldwide, and one type of nematode causes one-third of the
total estimated worldwide annual yield losses to all soybean pathogens. This project encompasses a
highly interdisciplinary team including one HBCU and one Hispanic Serving Institution. This will
provide a unique setting for the training of graduate and undergraduate students in the sciences and