Skip to content

Project Roadmap

This roadmap outlines the expected evolution of the Regatta Data Platform.

It is intended as a strategic direction rather than a fixed commitment, and phases may evolve as the project progresses.

The goal of the roadmap is to ensure the system evolves in a structured and scalable way, prioritising architectural stability before expanding functionality.


Phase 0 – Architecture Stabilisation (Completed)

Objective

Transform CSV-based workflows into a structured, database-driven system with operational pipelines and API access.

This phase focuses on establishing a robust technical foundation before expanding the platform.

Environment

  • Private GitHub repository
  • Local development environment
  • Docker used locally
  • No external infrastructure cost

Key Workstreams

Database Foundation

  • Design PostgreSQL schema
  • Define raw ingestion tables
  • Define canonical relational tables
  • Implement ingestion timestamping
  • Establish traceability between raw and canonical data

Pipeline Architecture

  • Refactor ingestion scripts to write directly to the database
  • Create structured ingestion pipelines
  • Introduce batch identifiers for traceability

API Layer (Initial)

  • Establish FastAPI project structure
  • Implement initial read-only endpoints
  • Support basic filtering (year, confidence, source)

Data Normalisation

This workstream runs in parallel with the architecture build.

Activities include:

  • defining normalisation rules
  • mapping raw values to canonical entities
  • assigning confidence levels
  • documenting edge cases
  • researching missing metadata

Completion Criteria

  • 2025 dataset stored in PostgreSQL
  • Raw → canonical traceability operational
  • API capable of querying structured data

Achieved Outcomes

  • Fully operational ETL pipelines
  • Canonical PostgreSQL database in use
  • FastAPI backend implemented
  • Structured logging system in place
  • CLI-based pipeline execution

Phase 1 – Private Pilot (Current phase)

Objective

Create a functional internal prototype usable by Raul and David.

Environment

  • Low-cost VPS (e.g. Hetzner)
  • Docker deployment
  • Private access only

Key Goals

Infrastructure

  • Deploy Docker stack to VPS
  • Configure SSH access and firewall
  • Implement automated database backups

Dataset Completion

  • Complete ingestion of the 2025 dataset
  • Perform data quality review
  • Operationalise confidence scoring

Semi-Automated Discovery

  • Maintain regatta calendar
  • Introduce URL-based ingestion workflow
  • Allow predefined ingestion workflows to be executed by junior contributors

API & Product Layer

  • Improve endpoint design and performance
  • Introduce pagination and filtering
  • Enhance API documentation

Frontend Development

  • Build initial React-based interface
  • Enable data exploration (regattas, boats)
  • Connect frontend to API endpoints

Additional Focus Areas

  • integration of frontend application
  • improving API usability for exploration
  • refining data quality and consistency

Completion Criteria

  • 2025 dataset trusted
  • 2026 ingestion requires minimal manual effort
  • Junior ingestion workflow stable
  • System usable without developer intervention

Phase 2 – Controlled Release

Objective

Prepare the system for broader controlled access.

Potential Infrastructure Evolution

  • Evaluate migration to managed cloud infrastructure (e.g. AWS)
  • Introduce monitoring and operational tooling

Key Areas of Work

Security & Access Control

  • User authentication
  • Role-based access control
  • Audit logging

Operational Stability

  • Error monitoring
  • Performance optimisation
  • Backup and recovery validation

Phase 3 – Scalable Production

This phase is not an immediate focus but represents the potential long-term evolution of the platform.

Possible areas of expansion include:

  • multi-user access
  • service-level reliability
  • legal and compliance considerations
  • expanded dataset coverage

AI-Assisted Development (Cross-Phase Initiative)

AI tools are integrated throughout the project as a development capability layer.

Current and future exploration areas include:

  • AI-assisted code generation
  • architecture-aware refactoring
  • automated test scaffolding
  • anomaly detection in datasets
  • rule suggestion for normalisation workflows
  • conversational interaction with the dataset

AI is treated as:

  • a productivity multiplier
  • a structured development tool
  • a capability to be explored incrementally

All AI-generated outputs remain human-reviewed and architecturally supervised.


Ongoing Maintenance (From Pilot Phase)

Once the pilot phase is active, recurring operational tasks will include:

  • monitoring ingestion failures
  • reviewing low-confidence mappings
  • adjusting schema when required
  • dependency updates
  • verifying backups
  • reviewing system performance as the dataset grows
  • monitoring API performance and usage
  • maintaining frontend-backend integration

Comments