Skip to content

Instantly share code, notes, and snippets.

@MehrazRumman
Last active July 29, 2025 13:13
Show Gist options
  • Save MehrazRumman/bc59c0b9672e1f976a7bd790e4f662b0 to your computer and use it in GitHub Desktop.
Save MehrazRumman/bc59c0b9672e1f976a7bd790e4f662b0 to your computer and use it in GitHub Desktop.

Contributor : Mehraz Hossain Rumman (he/him)
Organization : IOOS
Project Title : PyOBIS for stakeholders
Mentor : Tylar Wayne Cole Murray, PhD. (ey/em|he/him)
GSoC Project Page: https://summerofcode.withgoogle.com/programs/2025/projects/Af6KBBl2


Project Goals

This Google Summer of Code project aimed to enhance the PyOBIS Python client and demonstrate its utility through real-world marine biodiversity analysis. I implemented a caching mechanism in PyOBIS to improve performance and efficiency, along with updating documentation and tests to support the new feature. To support broader usage, I also developed generalized functions for fetching and preparing OBIS data.

As a practical application, I created a Jupyter Notebook focused on seagrass habitat analysis. The notebook covers data cleaning, integration of species and environmental data, machine learning-based habitat prediction, and generation of Species Distribution Model (SDM) maps. I presented this work as a keynote speaker at Hacking Limnology 2025 and also led a hands-on workshop on marine data analysis using the developed tools and notebook.

This project contributes to both the OBIS software ecosystem and the scientific community by combining performance improvements with accessible, reusable workflows for marine biodiversity research.


What I Accomplished

  • PyOBIS Caching Feature

    Implemented efficient caching in PyOBIS using requests-cache to reduce redundant API calls and improve performance. This feature supports persistent storage (SQLite, Redis), expiration control, and seamless integration with the existing codebase.

    PR Title
    #179 Caching feature
    #182 Cache location for different OS
    #188 Update requirements.txt

    Why requests-cache?

    • Better suited for HTTP requests than lru_cache or joblib.Memory
    • Handles cache expiration, storage, and status codes
    • Minimal changes to the code structure
  • Testing & Documentation

    • Added unit tests for caching logic
    • Updated README.md and CONTRIBUTING.md with usage details
    • Improved cross-platform cache path handling
    • Refined dev setup (requirements-dev.txt, pre-commit, cache toggling)

    All PRs merged successfully by @7yl4r

  • Seagrass Data Analysis (Notebook)

    I developed a Jupyter Notebook to analyze seagrass occurrence data in the Florida Keys using OBIS and GBIF sources. The workflow included:

    • Data fetching and cleaning (OBIS + GBIF)
    • Integration with environmental variables (salinity, temperature)
    • Anomaly detection using One-Class SVM
    • Temporal trend analysis
    • Generation of Species Distribution Model (SDM) maps
    • Visualization with Folium and Matplotlib

    This analysis helps identify normal and abnormal habitat conditions, aiding ecological research and decision-making.

Seagrass Map

  • Outreach & Community Engagement

    • Keynote Speaker

      I was invited as a keynote speaker at Hacking Limnology 2025, where I talked about my Google Summer of Code journey, contributions to PyOBIS, and applications of open marine data in ecological modeling. The session concluded with an interactive Q&A discussion, where I addressed questions about PyOBIS, the machine learning, Species distribution modeling (SDM) and the use of OBIS data in real-world marine science.

    • Workshop Instructor

      I led a hands-on workshop on Species Distribution Modeling (SDM) and machine learning using OBIS data. The session covered data cleaning, fetching marine and environmental data, and building predictive SDM maps using machine learning model.


Final Status

  • The PyOBIS caching feature was successfully implemented and merged upstream.
  • A complete seagrass habitat analysis was conducted using machine learning techniques and Species Distribution Modeling (SDM), documented in a Notebook.
  • Delivered a keynote presentation and led a hands-on workshop on SDM and marine data analysis at Hacking Limnology 2025.
  • All work has been documented, publicly shared, and is reproducible.

what's Next

  • Develop additional species analysis notebooks, focusing on particularly the Florida Keys.
  • Consider modularizing notebook utilities for broader reuse across OBIS-related projects.

Challenges & Learnings

  • Gained practical experience in contributing to an existing open-source library.
  • Learned how to integrate HTTP-level caching and design reusable APIs.
  • Faced and solved challenges related to cleaning and aligning multi-source ecological data.
  • Improved skills in scientific communication through both writing and live presentation formats.

Contact & Acknowledgments

Note

Huge thanks to my mentor Tylar Murry and the IOOS community for their continuous support and guidance.

Feel free to contact me at Email
Find me on GitHub : MehrazRumman
Linkedin : Mehraz Hossain Rumman
All work is publicly available and open for collaboration.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment