Skip to content
Mike Caprio edited this page Jan 7, 2018 · 54 revisions

Welcome To The Hack The Deep Challenge Wiki!

Proposed challenges are listed below. Each of these challenges has a museum stakeholder listed as the owner, and will list a primary goal of either: expanding public access, improving AMNH research, or enhancing education.

We will narrow this list down to a maximum 10 challenges.

Selected Challenges

  1. Historic Personages: Understand more historical info about prominent museum personages (scientists, researchers, explorers) from the past, and create metadata for the number of species they described, where they collected, where their material ended up. Maybe we could also scan all our accessions and OCR it and from that pull data together for the collections collected or donated by a given person? We don't really have all that. (n solutions) {Chris Johnson} ?? ---DALIO FOUNDATION, MAKE IT OCEANS CENTRIC; FOR BROWN SCHOLARS AND HIGH SCHOOLERS---

  2. Map The Collections: Georeference all collection specimens and visualize where they came from and when they were collected. Data from the 1900s can be compared to recent data and show changes. There are about 600,000 records migrated into KE EMU - one of the things that has always interested me is an ecological picture of a point in time; we could do mapping of our collections over time - what was collected (by order), where and when. (n solutions) {Chris Johnson} ??. Subchallenge - [RC-Pangea 2.0]: RC-Pangea project created at Hack The Dinos is close to what I want, so we should further extend the project. It uses the Paleobiology database at http://paleobiodb.org/navigator. We could also use the gplates database. This would be a continuation of the project for marine fossils (not just dinosaurs), as the PBDB is primarily marine fossils. When it is created, we can "advertise" the app on the PBDB "web apps" page that shows applications. (n solutions) {Melanie Hopkins} ??

  3. Virtual Fossil Fragmenter: The idea is to identify fragmented pieces of shells, bivalves, trilobytes. The theory is that we could take a 3D model of an organism, then programmatically "shatter" it into pieces, and randomly rotate the shattered pieces to generate a training set of images that could possibly identify the fragment. May require some research to determine feasibility. (n solutions) {Melanie Hopkins} ??

  4. Iron Out the Kinks: Automate the transition from a series of parallel image planes through a marine microbe to a 3D computer model of its structure. 2 specific tacks: (1) Pre-processing -- Specimens for the Museum's transmission electron microscope (TEM) are cut very thinly, and distort in the process. They need to be aligned with one another, and then adjusted to compensate for nonlinear compression, skew, and other distortions. (2) Post-processing -- Once the images are combined into a model, the outlines of smooth structures are invariably highly erratic, and need to be "cleaned up". Think of a string of beads dropped on the floor. We need an automated way to pull the string taut, and have the beads line up, without making them completely straight. (2n solutions) {Aaron Heiss} ??

  5. The Eye of Maria: Understand what would be the effect of the eye of a hurricane on any object drifting or sailing in the ocean. Use data of drifters released in the tropical Atlantic during summer 2016 and 2017. For a more complete dataset they can use http://www.aoml.noaa.gov/envids/gld/FtpInterpolatedInstructions.php. Hurricane pathways would be easy to find through http://www.nhc.noaa.gov. (n solutions) {David Lindo} ?? ---GREAT---

  6. The Great Pacific Garbage Patch: Map, animate, and quantify geographical sources and travel time scales of plastic found in the GPGP. We have data of plastic locations and ocean model data of ocean currents to backtrack from sinks to sources. (n solutions) {David Lindo} ?? ---CONSERVATION--- PlasticAdrift.org https://github.com/adriftICL and COPEPOD at NOAA

  7. ?

  8. ?

  9. ?

  10. ?


Strong Candidates

  • Trees in the Genetic Forest: Identify the parts of an evolutionary tree that don't fit, the results of poor -- or poorly assembled -- data. Could potentially be turned into a "game" along the lines of Open-Phylo as described on Biomedcentral. (This will be a pattern-recognition exercise, where the patterns are all Newick files) (n solutions) {Aaron Heiss} ??

  • [Jellyfish Inspector]: Machine learning to help with classifying cnidarians Reclassify images from literature Possibly use crowdsourcing to help add "objectivity"

Library API Get list of journals for specific nematocyst / cnidarian related articles

Other online data sources to get additional images Identify other publications that have nematocyst articles that can be collected

  1. Must identify the capsule within the image
  2. Must orient the capsule to the same direction

Candidates Being Reviewed

  • [Fossil Analyzer]: Use computer vision to count the segments on trilobytes. We have the AMNH trilobyte database (where do these images come from? Do we have open data rights to them?). Ideally we want to detect large spines and link them to a particular identified segment. (Can the library API produce more trilobyte images? From digital publications?) The goal is just to collect the data - then look at segmentation patterns through time, geo, et cetera. Possibly create an API for trilobyte data? (n solutions) {Melanie Hopkins} ??

  • [Geometric morphometrics]: A popular and powerful tool for describing shapes, such as bone or shell morphology, is through the analysis of configurations of point coordinates that represent the shape. Tools for semi-automating the collection of landmarks are being developed. However, bones and shells often have ornamentation on them that does not contribute to the description of the overall shape, but does add noise to the dataset. Maybe there is a way to smooth out bumpy or ridged surfaces of 3d surface reconstructions of bones and shells so that automated collection of points on those surfaces produce more accurate descriptions of the overall shape. Could be something that works in conjunction with MeshLab (This could be something that works in conjunction with the DinoJerks 2.0 challenge). (n solutions) {Melanie Hopkins} ??

  • [DinoJerks 2.0]: Is it possible to build upon the work of the DinoJerks solution to make it easier or more efficient to select areas on scans? This is akin to the Brain Builder challenge of recognizing the shape of a fossil embedded in rock. (n solutions) {Melanie Hopkins} ??

  • [Hotspots of the Ocean Exhibit]: Map the pathways and hotspots of visitors of the new ocean exhibit. The map could show a heat map with retention times at each hotspot. Could potentially use the bluetooth beacon system. Connecting to amnh-guest, tweeting, or instagramming might also be considered as a proxy. (n solutions) {David Lindo} ??

  • [Painting the Ocean]: Design a software/tool that identifies the path of a particle or, even better, the path of an oil patch in the ocean using images. We could use images or videos from my lab (latest twitters to #ESClindo), satellite images from the BP oil spill publicly available though GOMRI, or data from a real dye release experiment in Florida (I would have to contact a colleague down in Miami for the latter).
 (n solutions) {David Lindo} ?? ---CONSERVATION---

  • [Visualizing invisible ocean eddies]: Generate 4D visualizations of the tracks of the centroids of eddies. For example, images similar to the ones attached. We have ocean currents data from the MIT general circulation model that we generated for one of my projects and an algorithm to detect the center of rotating bodies of water. I have high research interest on that one because I have two projects recently funded by NOAA on eddies. One on eddies around the Hawaiian Archipelago and another one in the Gulf of Mexico. I also have a proposal pending on eddies around Cuba. As a background, hurricanes in the atmosphere are easy to identify and track, but the ID process is not that trivial with ocean eddies. These features are very important for the health of the ocean ecosystems because they trap nutrients, prey, and pollutants. They are also important for climate because they feed and trap phytoplankton and CO2. (n solutions) {David Lindo} ??

  • [Mining for biological traits]: Let's say I want to collect a bunch of data about the absence or presence of certain characters across a wide range of trilobites. I would then want to query a bunch of literature, looking for keywords and then associate those with species names. The keywords might show up in the description for the species, or they might show up elsewhere in a description of the genus that the species belongs to (if all species in that genus shared the feature and was thus a notable or defining character of the genus). The algorithm would then have to both locate the terms and associate them properly with different taxon names. The reference libraries would be a glossary of terms and a taxonomy (the hierarchical structure would be known). {Melanie Hopkins}


Challenges Being Tabled

  • [Culture Collection Catalog]: Create an application/database to track the health and living conditions of the live organisms in the Museum's microbial cultures. (n solutions) {Aaron Heiss} ?? MAY BE DROPPING THIS IN FAVOR OF USING ANOTHER SOLUTION FOR SPREADSHEET TRACKING.