Contributor : Mehraz Hossain Rumman (he/him)
Organization : IOOS
Project Title : PyOBIS for stakeholders
Mentor : Tylar Wayne Cole Murray, PhD. (ey/em|he/him)
GSoC Project Page: https://summerofcode.withgoogle.com/programs/2025/projects/Af6KBBl2
This Google Summer of Code project aimed to enhance the PyOBIS Python client and demonstrate its utility through real-world marine biodiversity analysis. I implemented a caching mechanism in PyOBIS to improve performance and efficiency, along with updating documentation and tests to support the new feature. To support broader usage, I also developed generalized functions for fetching and preparing OBIS data.
As a practical application, I created a Jupyter Notebook focused on seagrass habitat analysis. The notebook covers data cleaning, integration of species and environmental data, machine learning-based habitat prediction, and generation of Species Distribution Model (SDM) maps. I presented this work as a keynote speaker at Hacking Limnology 2025 and also led a hands-on workshop on marine data analysis using the developed tools and notebook.
This project contributes to both the OBIS software ecosystem and the scientific community by combining performance improvements with accessible, reusable workflows for marine biodiversity research.
-
Implemented efficient caching in PyOBIS using
requests-cache
to reduce redundant API calls and improve performance. This feature supports persistent storage (SQLite, Redis), expiration control, and seamless integration with the existing codebase.PR Title #179 Caching feature #182 Cache location for different OS #188 Update requirements.txt Why
requests-cache
?- Better suited for HTTP requests than
lru_cache
orjoblib.Memory
- Handles cache expiration, storage, and status codes
- Minimal changes to the code structure
- Better suited for HTTP requests than
-
- Added unit tests for caching logic
- Updated
README.md
andCONTRIBUTING.md
with usage details - Improved cross-platform cache path handling
- Refined dev setup (
requirements-dev.txt
, pre-commit, cache toggling)
All PRs merged successfully by @7yl4r
-
I developed a Jupyter Notebook to analyze seagrass occurrence data in the Florida Keys using OBIS and GBIF sources. The workflow included:
- Data fetching and cleaning (OBIS + GBIF)
- Integration with environmental variables (salinity, temperature)
- Anomaly detection using One-Class SVM
- Temporal trend analysis
- Generation of Species Distribution Model (SDM) maps
- Visualization with Folium and Matplotlib
This analysis helps identify normal and abnormal habitat conditions, aiding ecological research and decision-making.
-
-
I was invited as a keynote speaker at Hacking Limnology 2025, where I talked about my Google Summer of Code journey, contributions to PyOBIS, and applications of open marine data in ecological modeling. The session concluded with an interactive Q&A discussion, where I addressed questions about PyOBIS, the machine learning, Species distribution modeling (SDM) and the use of OBIS data in real-world marine science.
- Event page : Hacking Limnology 2025 – Day 1
- Presentation video : Watch on Google Drive
-
I led a hands-on workshop on Species Distribution Modeling (SDM) and machine learning using OBIS data. The session covered data cleaning, fetching marine and environmental data, and building predictive SDM maps using machine learning model.
-
- The PyOBIS caching feature was successfully implemented and merged upstream.
- A complete seagrass habitat analysis was conducted using machine learning techniques and Species Distribution Modeling (SDM), documented in a Notebook.
- Delivered a keynote presentation and led a hands-on workshop on SDM and marine data analysis at Hacking Limnology 2025.
- All work has been documented, publicly shared, and is reproducible.
- Develop additional species analysis notebooks, focusing on particularly the Florida Keys.
- Consider modularizing notebook utilities for broader reuse across OBIS-related projects.
- Gained practical experience in contributing to an existing open-source library.
- Learned how to integrate HTTP-level caching and design reusable APIs.
- Faced and solved challenges related to cleaning and aligning multi-source ecological data.
- Improved skills in scientific communication through both writing and live presentation formats.
Note
Huge thanks to my mentor Tylar Murry and the IOOS community for their continuous support and guidance.
Feel free to contact me at Email
Find me on GitHub : MehrazRumman
Linkedin : Mehraz Hossain Rumman
All work is publicly available and open for collaboration.