Project Summary

Title: Collaborative Research: Interactive and intelligent searching of biological images by query and network navigation with learning capabilities

 

Supported by: US National Science Foundation Award Number 0808632.

 

Problem: A fundamental and hard question in biology is identification: given a biological sample,

we want to identify its closest match among a set of known samples. In this proposal, we focus on

nematodes, which are a very important group of animals. Despite their importance, nematode

research has not reached the desired level of maturity. A key problem is that nematodes are

particularly difficult to identify: the average identification currently takes approximately 2 days, and

a researcher needs 3-5 years of effort to obtain “fluency”. The major limiting factors are: (a) the lack

of tools and automation, (b) the need for book-based image comparison, and (c) the need for

expertise in order to use the existing resources. The few existing databases and tools are more geared

towards seasoned researchers and are of limited use to novice or high school students. We identify

three distinct but related scenarios pertaining to the problem of identification:

Scenario 1 –searching: How can I find the closest “image” among a set to a new digitized "image"?

Scenario 2 - mining: Given a set of "images", can I identify structure and patterns?

Scenario 3 - browsing: How can I help the user navigate the data set effectively to find the

sample that matches better the "image" in his/her microscope?

 

Previous work: There is a vast literature in image searching by example (an area where the PIs

have major contributions), which could partly answer scenarios 1 and 2. However, the unique

features of our target images make the adoption of previous techniques non trivial. First, we use the

term "image" to denote a set of images that scan the body of the object of interest (think MRI

images). Second, our images are typically low-contrast and mostly monochrome. Third, the least

pronounced features are often the most important for the identification of a species. Fourth, similar

looking features occur in unrelated species and can easily cause mistakes.

 

Vision: Our vision is to make nematode identification a simple process of point-and-click. Our goals

are to make it easy for anyone to query, and identify species. Our goal is to: (a) make research easier,

(b) make biology popular by making it accessible, and (c) harness and integrate both the data and the

experience of experts, students, and amateurs into a searchable interactive high-quality nematode

database.

 

Intellectual Merit: We propose to develop a computer-assisted interactive navigation process,

which intelligently assists and learns from the user. We propose to develop novel and customize

existing techniques, and to integrate capabilities into a comprehensive framework. We propose to

use nematodes, which are particularly challenging to work with as mentioned above. However, our

work can extend to many other biological databases. The main research challenges are the extraction

of features and similarity functions for nematodes, and the mining, clustering and anomaly detection

for image and non-image data.

The novelty of our approach is twofold. First, we step away from a tree-based classification approach

in order to make the identification process more robust to human errors. We propose to evaluate and

use methods using richly-connected networks representing the similarity of samples, and

multidimensional feature space to identify discriminating features. Second, our approach enables us

to operate in a human-centric way in a world of imperfections and subjectivity: it is designed to

handle vague "looks like this" decisions and at the same time leverage humans' superior visual

cognition compared to machines.

 

Broader Impact: The proposed work can revolutionize nematode research by reducing the

identification time by an order of magnitude (from days to hours). Nematodes have direct and

significant effect on humans, animals, and agriculture. For example, four species of nematode

parasites infect over 2 billion people worldwide, and one type of nematode causes one-third of the

total estimated worldwide annual yield losses to all soybean pathogens. This project encompasses a

highly interdisciplinary team including one HBCU and one Hispanic Serving Institution. This will

provide a unique setting for the training of graduate and undergraduate students in the sciences and

engineering