Skip to content

ADR-001: Adopt a Database-First Architecture Using PostgreSQL

Status: Accepted Date: 2026-03-06

Context

The early exploration phase of the project relied on CSV files generated by scraping scripts.

This approach allowed quick experimentation and made it easy to demonstrate to stakeholders that structured data could be extracted from regatta result pages.

However, as the project evolved from experimentation toward a structured data platform, several limitations of a file-based workflow became clear:

  • Difficult traceability between raw data and processed data
  • Increasing complexity managing multiple CSV files
  • Limited ability to enforce relational structure and constraints
  • Poor scalability as the dataset grows
  • Difficult integration with APIs and future web applications

The project required a more robust and scalable system of record capable of supporting ingestion pipelines, entity normalisation, and application-layer access.

Decision

The system adopts a database-first architecture, using PostgreSQL as the central system of record.

In this model:

  • All raw ingestion data is stored in the database
  • Normalised canonical entities are stored in relational tables
  • CSV files are used only as temporary artefacts for data review or mapping workflows
  • All application logic and APIs operate on top of the database

PostgreSQL was selected because:

  • It is a mature and widely used relational database
  • It supports complex relational modelling
  • It provides excellent JSON support for raw ingestion data
  • It integrates easily with modern Python frameworks such as FastAPI
  • It supports future scalability and deployment options

Consequences

Adopting a database-first architecture provides several benefits:

  • Clear separation between raw data and canonical data
  • Improved traceability and reproducibility of transformations
  • Strong relational structure for modelling entities and relationships
  • Simplified integration with APIs and future applications
  • Better scalability as the dataset grows

However, it also introduces some trade-offs:

  • Increased initial complexity compared to simple CSV workflows
  • Need for database schema design and migrations
  • Additional infrastructure requirements for deployment

Overall, the database-centric architecture provides a solid foundation for building a scalable data platform and supports the long-term evolution of the project.

Comments