Back to Home
Mars Weather Data ETL Pipeline

Mars Weather Data ETL Pipeline

Demo Video

This project is a Java-based application that fetches real-time Martian weather data from NASA's InSight API and stores it in MongoDB. It enables storage, retrieval, and analysis of weather parameters such as temperature, pressure, and wind speed on Mars. The system also includes a visualization dashboard and supports ETL automation using Airflow.

Key Features

  • Retrieves Mars weather data via the NASA InSight API
  • Parses JSON responses and stores them in MongoDB
  • Supports retrieval of stored weather documents for offline analysis
  • Uses environment variables to manage sensitive credentials securely
  • Includes optional relational DB integration (not fully implemented)

Prerequisites

  • Java 11 or later
  • Maven for project management
  • MongoDB as the NoSQL database
  • Internet access for API calls
  • NASA API Key and MongoDB credentials

Core Components

  • MongoDBClient: Manages connections and data operations for MongoDB
  • Main: Controls application flow, including API fetching and DB interactions
  • GetAPI: Handles HTTP requests to NASA InSight API
  • DataModel: Represents the weather data structure (WIP)
  • DateUtil: Utility for date formatting (WIP)
  • DataRepository: Placeholder for potential relational DB storage

Third-Party Libraries

  • MongoDB Java Driver for DB integration
  • OkHttp for API communication
  • Jackson for JSON parsing
  • dotenv-java for environment variable management

Visualization Dashboard

A separate Dash-based web application visualizes the stored data from MySQL (or processed outputs). The dashboard provides interactive graphs and visual trends:

  • Tracks pressure, temperature, and wind speed variations
  • Explores relationships like temperature vs pressure
  • Displays wind direction distributions using rose plots
  • Styled for readability with a dark theme

Airflow ETL Pipeline

  • A daily Airflow DAG triggers a JAR task to fetch data from NASA
  • Docker Compose sets up the Airflow environment with PostgreSQL
  • Airflow services include webserver, scheduler, and an init container
  • Supports automation of data fetching and loading for analytics

Deployment Highlights

  • Runs Java application via Maven and JAR execution
  • Dashboard served as a Python Dash app
  • Airflow and PostgreSQL orchestrated using Docker Compose

Contribution

Contributors are welcome to enhance functionality, fix bugs, or expand visualization features. Open an issue or submit a pull request for review.

License

Licensed under the MIT License. See LICENSE file for usage terms.

Acknowledgments

  • NASA for providing open access to Martian weather data
  • Open-source maintainers of MongoDB, Jackson, OkHttp, LangChain, and Airflow