Skip to content
Mike Caprio edited this page Jan 8, 2018 · 54 revisions

Welcome To The Hack The Deep Challenge Wiki!


Make sure you have accepted your GitHub invitation and are on a team in the HackTheDeep organization! Your repositories MUST all be created in the Hack the Deep org.

If you are not already on it, # for the Slack and join channels related to your challenge / interests.


The Hack The Deep challenge is a 24 hour solution-building event produced by AMNH's BridgeUP: STEM, an educational initiative focused on the intersection of computer science and science with an after-school program for high school girls and underrepresented youth.

The challenge starts on Saturday, November 21st at 3:00 p.m. and runs until Sunday, November 22nd at 6:00 p.m. Your team will have about 24 hours to build something that solves some (or all!) of a particular problem statement. You may even collaborate with several teams to work on different aspects of the same solution - Hack The Dinos is all about cooperation, not competition. The emphasis is on building working prototypes, not clickable demos; even if your project is incomplete, having a solid foundation to build upon will be a huge step forward for the advancement of paleontology and can benefit museums around the world.

We hope you’ll choose to spend the night with us at the museum -- we’ll have cots and a dedicated sleeping area (just bring your sleeping bag and pillow) if you need your rest -- but if not, you will be able to exit the museum Saturday night and return at 10:00 a.m. on Sunday.

There will also be family-friendly coding and data science talks and activity stations on Sunday from Noon to 3:00 p.m. Demos will begin at 3:00 p.m. and are open to the public! We hope you’ll want to invite friends, family, and coworkers to see what you’ve built. All Sunday activities are free with museum admission.

Good luck!

Selected Challenges

  1. Historic Personages: Understand more historical info about prominent museum personages (scientists, researchers, explorers) from the past, and create metadata for the number of species they described, where they collected, where their material ended up. Maybe we could also scan all our accessions and OCR it and from that pull data together for the collections collected or donated by a given person? We don't really have all that. (n solutions) {Chris Johnson} ?? ---DALIO FOUNDATION, MAKE IT OCEANS CENTRIC; FOR BROWN SCHOLARS AND HIGH SCHOOLERS---

  2. Map The Collections: Georeference all collection specimens and visualize where they came from and when they were collected. Data from the 1900s can be compared to recent data and show changes. There are about 600,000 records migrated into KE EMU - one of the things that has always interested me is an ecological picture of a point in time; we could do mapping of our collections over time - what was collected (by order), where and when. (n solutions) {Chris Johnson} ??. Subchallenge - [RC-Pangea 2.0]: RC-Pangea project created at Hack The Dinos is close to what I want, so we should further extend the project. It uses the Paleobiology database at http://paleobiodb.org/navigator. We could also use the gplates database. This would be a continuation of the project for marine fossils (not just dinosaurs), as the PBDB is primarily marine fossils. When it is created, we can "advertise" the app on the PBDB "web apps" page that shows applications. (n solutions) {Melanie Hopkins} ??

  3. Virtual Fossil Fragmenter: The idea is to identify fragmented pieces of shells, bivalves, trilobytes. The theory is that we could take a 3D model of an organism, then programmatically "shatter" it into pieces, and randomly rotate the shattered pieces to generate a training set of images that could possibly identify the fragment. May require some research to determine feasibility. (n solutions) {Melanie Hopkins} ??

  4. Iron Out the Kinks: Automate the transition from a series of parallel image planes through a marine microbe to a 3D computer model of its structure. 2 specific tacks: (1) Pre-processing -- Specimens for the Museum's transmission electron microscope (TEM) are cut very thinly, and distort in the process. They need to be aligned with one another, and then adjusted to compensate for nonlinear compression, skew, and other distortions. (2) Post-processing -- Once the images are combined into a model, the outlines of smooth structures are invariably highly erratic, and need to be "cleaned up". Think of a string of beads dropped on the floor. We need an automated way to pull the string taut, and have the beads line up, without making them completely straight. (2n solutions) {Aaron Heiss} ??

  5. The Eye of Maria: Understand what would be the effect of the eye of a hurricane on any object drifting or sailing in the ocean. Use data of drifters released in the tropical Atlantic during summer 2016 and 2017. For a more complete dataset they can use http://www.aoml.noaa.gov/envids/gld/FtpInterpolatedInstructions.php. Hurricane pathways would be easy to find through http://www.nhc.noaa.gov. (n solutions) {David Lindo} ?? ---GREAT---

  6. The Great Pacific Garbage Patch: Map, animate, and quantify geographical sources and travel time scales of plastic found in the GPGP. We have data of plastic locations and ocean model data of ocean currents to backtrack from sinks to sources. (n solutions) {David Lindo} ?? ---CONSERVATION--- PlasticAdrift.org https://github.com/adriftICL and COPEPOD at NOAA

  7. Trilobite Vision: Use computer vision to count the segments or identify other characteristics of trilobites. We have the AMNH trilobite database (where do these images come from? Do we have open data rights to them?). Ideally we want to detect large spines and link them to a particular identified segment. (Can the library API produce more trilobite images? From digital publications?) The goal is just to collect the data - then look at segmentation patterns through time, geo, et cetera. Possibly create an API for trilobite data? (n solutions) {Melanie Hopkins} ??

  8. Jellyfish Inspector: Machine learning to help with classifying cnidarians. Reclassify images from literature. Possibly use crowdsourcing to help add "objectivity" Library API - Get list of journals for specific nematocyst / cnidarian related articles. Other online data sources to get additional images, Identify other publications that have nematocyst articles that can be collected: 1) Must identify the capsule within the image; 2) Must orient the capsule to the same direction.

  9. ?

  10. ?


Candidates Being Reviewed

  • [Geometric morphometrics]: A popular and powerful tool for describing shapes, such as bone or shell morphology, is through the analysis of configurations of point coordinates that represent the shape. Tools for semi-automating the collection of landmarks are being developed. However, bones and shells often have ornamentation on them that does not contribute to the description of the overall shape, but does add noise to the dataset. Maybe there is a way to smooth out bumpy or ridged surfaces of 3d surface reconstructions of bones and shells so that automated collection of points on those surfaces produce more accurate descriptions of the overall shape. Could be something that works in conjunction with MeshLab (This could be something that works in conjunction with the DinoJerks 2.0 challenge). (n solutions) {Melanie Hopkins} ??

  • [DinoJerks 2.0]: Is it possible to build upon the work of the DinoJerks solution to make it easier or more efficient to select areas on scans? This is akin to the Brain Builder challenge of recognizing the shape of a fossil embedded in rock. (n solutions) {Melanie Hopkins} ??

  • [Hotspots of the Ocean Exhibit]: Map the pathways and hotspots of visitors of the new ocean exhibit. The map could show a heat map with retention times at each hotspot. Could potentially use the bluetooth beacon system. Connecting to amnh-guest, tweeting, or instagramming might also be considered as a proxy. (n solutions) {David Lindo} ??

  • [Painting the Ocean]: Design a software/tool that identifies the path of a particle or, even better, the path of an oil patch in the ocean using images. We could use images or videos from my lab (latest twitters to #ESClindo), satellite images from the BP oil spill publicly available though GOMRI, or data from a real dye release experiment in Florida (I would have to contact a colleague down in Miami for the latter).
 (n solutions) {David Lindo} ?? ---CONSERVATION---

  • [Visualizing invisible ocean eddies]: Generate 4D visualizations of the tracks of the centroids of eddies. For example, images similar to the ones attached. We have ocean currents data from the MIT general circulation model that we generated for one of my projects and an algorithm to detect the center of rotating bodies of water. I have high research interest on that one because I have two projects recently funded by NOAA on eddies. One on eddies around the Hawaiian Archipelago and another one in the Gulf of Mexico. I also have a proposal pending on eddies around Cuba. As a background, hurricanes in the atmosphere are easy to identify and track, but the ID process is not that trivial with ocean eddies. These features are very important for the health of the ocean ecosystems because they trap nutrients, prey, and pollutants. They are also important for climate because they feed and trap phytoplankton and CO2. (n solutions) {David Lindo} ??

  • [Mining for biological traits]: Let's say I want to collect a bunch of data about the absence or presence of certain characters across a wide range of trilobites. I would then want to query a bunch of literature, looking for keywords and then associate those with species names. The keywords might show up in the description for the species, or they might show up elsewhere in a description of the genus that the species belongs to (if all species in that genus shared the feature and was thus a notable or defining character of the genus). The algorithm would then have to both locate the terms and associate them properly with different taxon names. The reference libraries would be a glossary of terms and a taxonomy (the hierarchical structure would be known). {Melanie Hopkins}