Blood Cell Detection using Cloud Computing

The Google approach to detect thousands of cells.

Design

This is a programming project that involves the implementation of a cloud computing program using Hadoop. In particular, I implemented a biomedical image analysis system to detect cells in a large number of images extract from an image list. The system utilizes the previously developed algorithm in Matlab as well as embraces the powerful cloud computing framework in Hadoop MapReduce. Even though the runtime has not been improved due to cluster configuration, the system has successfullly run cell detection using Hadoop. I have tested the system in a single-node-machine, and the results are satisfactory for detection purpose. Thoughout this project, I have learned many things about Hadoop API and its applications toward my research.

Data

Over 200 data sequences.
Each sequence contain at least 200 image frames.
Each frame is a 1000 X 1000 pixel digital image.
Total of possible processing data: 32 GB.

The figure shows a sample image frame.

Design Decisions

To overcome programming languages incompatible, Matlab Code must be converted to .exe
Map step will run detetion operation and likely to take most of running time.
Reduce step will be executed as a identity reducer.
Ideally, images in the same video sequence will be merged into a SequenceFile format that can be partition by Hadoop framework.
Detection result is output as one .txt file.

Design Module

Implementation

Limitations

Configured as one-node-cluster.
Assumed the application knows where the image files located.
Parallelization maynot be optimized until is operated in multiple-node cluster.
Input as ImageName TextFile insteads of InputSequenceFileFormat.

Download

Project Code

Input File

Output File

Instruction for running

$ javac -classpath Hadoop-0.19.0-core.jar -d rich CellDetection.java
$ jar -cvf celldetection.jar -C rich .
$ bin/hadoop jar celldetection.jar org.myorg.CellDetection input output

Results

Execution Screen Shot

Output Explanations

Each image detection result is presented as a line in the output file.
Output line format: ImageName, Cell 01 Info ~@~ Cell 02 Info ~@~ Cell 03 Info ~@~ ...
Cell Info contains: cell ID, x-coordinate, y-coordinate, size, off-plane angle, long axis, and short axis

Visualization

Cell information results from the output is visualized in an input image. RED indicates cell detection by appreance GREEN indicates cell detection by motion

Q & A

Q: What have I learned?

Q: What's the most challenging obstacle ?

Q: Is there any interesting results that I found ?

Q: Where are useful resources that I discover while working on this project ?

Cloudera video training

Hadoop Wiki

Hadoop 0.19.0 API

Q: Why is this project significant ?

Q: How much work did I put in ?

Q: Why did the implementation not completely follow the design ?

Q: What to be improved in the future ?