Skip to content

Instantly share code, notes, and snippets.

@7yl4r
Forked from MehrazRumman/GSoCFinalWork.md
Last active July 29, 2025 13:15
Show Gist options
  • Save 7yl4r/61bf6a7ebe26535e41aad318a90bc38c to your computer and use it in GitHub Desktop.
Save 7yl4r/61bf6a7ebe26535e41aad318a90bc38c to your computer and use it in GitHub Desktop.

Contributor : Mehraz Hossain Rumman (he/him)
Organization : IOOS
Project Title : PyOBIS for stakeholders
Mentor : Tylar Wayne Cole Murray, PhD. (ey/em|he/him)
GSoC Project Page: https://summerofcode.withgoogle.com/programs/2025/projects/Af6KBBl2

final SDM map


Introduction

Through collaboration between NOAA IOOS and the Google Summer of Code, the PyOBIS Python client has been enhanced and its utility has been demonstrated through real-world marine biodiversity analysis.

A caching mechanism has been added to PyOBIS which improves performance and efficiency. Associated documentation and tests have been added to support the new feature. To support broader usage, generalized functions for fetching and preparing OBIS data have been published.

As a practical application, a Jupyter Notebook focused on seagrass habitat analysis have been published. The notebook covers data cleaning, integration of species and environmental data, machine learning-based habitat prediction, and generation of Species Distribution Model (SDM) maps. This work has been presented as a keynote at Hacking Limnology 2025 and also via a hands-on workshop on marine data analysis using the developed tools and notebook.

These contributions to both the OBIS software ecosystem and the scientific community combine performance improvements with accessible, reusable workflows for marine biodiversity research.


  • PyOBIS Caching Feature

    Implemented efficient caching in PyOBIS using requests-cache to reduce redundant API calls and improve performance. This feature supports persistent storage (SQLite, Redis), expiration control, and seamless integration with the existing codebase.

    PR Title
    #179 Caching feature
    #182 Cache location for different OS
    #188 Update requirements.txt

    Why requests-cache?

    • Better suited for HTTP requests than lru_cache or joblib.Memory
    • Handles cache expiration, storage, and status codes
    • Minimal changes to the code structure
  • Testing & Documentation

    • Added unit tests for caching logic
    • Updated README.md and CONTRIBUTING.md with usage details
    • Improved cross-platform cache path handling
    • Refined dev setup (requirements-dev.txt, pre-commit, cache toggling)
  • Seagrass Data Analysis (Notebook)

    A Jupyter Notebook to analyze seagrass occurrence data in the Florida Keys using OBIS and GBIF sources has been published. The workflow included:

    • Data fetching and cleaning (OBIS + GBIF)
    • Integration with environmental variables (salinity, temperature)
    • Anomaly detection using One-Class SVM
    • Temporal trend analysis
    • Generation of Species Distribution Model (SDM) maps
    • Visualization with Folium and Matplotlib

    This analysis helps identify normal and abnormal habitat conditions, aiding ecological research and decision-making.

  • Outreach & Community Engagement

    • Keynote Speaker

Mehraz Hossain Rumman was invited as a keynote speaker at Hacking Limnology 2025, where he presented on his Google Summer of Code journey, contributions to PyOBIS, and applications of open marine data in ecological modeling. The session concluded with an interactive Q&A discussion, where questions were addressed regarding PyOBIS, machine learning, Species distribution modeling (SDM), and the use of OBIS data in real-world marine science.

A hands-on workshop on Species Distribution Modeling (SDM) and machine learning using OBIS data was held at Hacking Limnology 2025. The session covered data cleaning, fetching marine and environmental data, and building predictive SDM maps using machine learning model.


Final Status

  • The PyOBIS caching feature was successfully implemented and merged upstream.
  • A complete seagrass habitat analysis was conducted using machine learning techniques and Species Distribution Modeling (SDM), documented in a Notebook.
  • Delivered a keynote presentation and led a hands-on workshop on SDM and marine data analysis at Hacking Limnology 2025.
  • All work has been documented, publicly shared, and is reproducible.

what's Next

  • Develop additional species analysis notebooks, focusing on particularly the Florida Keys.
  • Consider modularizing notebook utilities for broader reuse across OBIS-related projects.

Challenges & Learnings

  • Gained practical experience in contributing to an existing open-source library.
  • Learned how to integrate HTTP-level caching and design reusable APIs.
  • Faced and solved challenges related to cleaning and aligning multi-source ecological data.
  • Improved skills in scientific communication through both writing and live presentation formats.

Contact & Acknowledgments

Note

Huge thanks to my mentor Tylar Murray and the IOOS community for their continuous support and guidance.

Feel free to contact me at Email
Find me on GitHub : MehrazRumman
Linkedin : Mehraz Hossain Rumman
All work is publicly available and open for collaboration.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment