How to populate EU projects (begin/end dates, acronyms, awarded PMs/EUR)
This describes how to get EU project data (from EU Funding & Tenders org-details) into the app database with project long name, acronym, begin/end dates, and awarded person-months or EUR.
Data flow
- Fetch – eu_playwright org-details (with
--fetch-details) writesorg_details_<slug>_results.jsonwith per-project: title, acronym, url, project_id, start_date, end_date, awarded_eur, person_months, eu_contribution_eur, funding_programme, participants, etc. - Enrich dates (optional) – If some projects lack start_date/end_date in the JSON, you can save project HTML pages and run
scripts/eu_portal/update_org_details_dates_from_saved_pages.pyto parse dates from HTML and update the JSON. - Import into DB – Run
scripts/eu_portal/seed_eu_projects_from_org_details.pyto create/update Project rows from the JSON (name, planned begin/end, optional WP+Task with awarded PMs).
1. Fetch EU project data (eu_playwright)
From the repo root:
# Install Playwright and Chromium if not already
uv sync --group dev
uv run playwright install chromium
# Fetch org-details (e.g. by PIC) and open each project page for details
PYTHONPATH=. uv run python -m eu_playwright.cli org-details --fetch-details
Output: tmp/downloaded_data/eu_playwright/org_details_<slug>_results.json.
Options:
--url "https://ec.europa.eu/.../org-details/<PIC>"– different org.--out-dir tmp/downloaded_data/eu_playwright– output directory.--no-headless– show browser.--connect http://localhost:9222– attach to existing Chrome (see run-eura-hanketietopalvelu).
With --fetch-details, each project’s detail page is parsed for acronym, start_date, end_date, status, participants, eu_contribution_eur, funding_programme, etc. Dates in the JSON are in EU portal format (e.g. "01 November 2019", "31 October 2022").
2. Enrich dates from saved HTML (optional)
If some projects in the JSON have empty start_date/end_date, you can save the project HTML pages (eu_playwright with --save-pages) and then run:
uv run python scripts/eu_portal/update_org_details_dates_from_saved_pages.py
uv run python scripts/eu_portal/update_org_details_dates_from_saved_pages.py --json tmp/downloaded_data/eu_playwright/org_details_<slug>_results.json --saved-pages-dir ./saved-pages-dir
This parses “Start date” / “End date” from the HTML and updates the JSON in place.
3. Import EU projects into the database
Prerequisites:
- Database (Podman PostgreSQL 18+);
DATABASE_URLset (e.g. from.env.dev). - Top org present (e.g. from
scripts/seed_example_projects.pyorscripts/seed_from_ods.py). - Migrations applied.
Run:
uv run python scripts/eu_portal/seed_eu_projects_from_org_details.py
Default JSON path: tmp/downloaded_data/eu_playwright/org_details_<slug>_results.json.
Options:
--json/-j– path toorg_details_<slug>_results.json.--top-org– short name of the project owner org (default: your top org short name).--name-style– how to set project name:title(long name only),acronym(acronym only), oracronym_title(e.g. "TEAMS – Teaching Entrepreneurship…") (default:title).--dry-run– only print what would be created/updated; do not write to DB.
What the script does:
- Reads
result.projectsfrom the JSON. - For each project:
- Name: Set from title and/or acronym according to
--name-style(must be unique; duplicates get a suffix). - Planned begin/end: Parsed from
start_dateandend_date(EU format "DD Month YYYY"). - Duration: Set from begin/end if both present.
- Project owner org: Set to the given top org (e.g. EXAMPLE).
- Funding programme: If
funding_programmeis present in the JSON, the script tries to match an existing FundingProgramme (by label/code under an "EU" group) or skips linking if none found. - Awarded PMs: If
person_monthsis present and parseable, the script creates one Work Package (WP 1) and one Task (task_id 1) withawarded_pmsset; otherwise it still creates Project only (no WPs/tasks).
- Name: Set from title and/or acronym according to
- Idempotent: projects are matched by name; existing projects are updated (dates, duration). Re-running with the same JSON updates existing rows.
Limitations:
- Awarded EUR from the JSON (
awarded_eur,eu_contribution_eur) is not stored in the app schema (no project-level EUR field); it remains in the JSON and can be shown from a future “EU metadata” view or stored in a custom field if added later. - Acronym is only used in the project name (by
--name-style); there is no separate acronym column on Project. - EU project_id is not stored; matching on re-import is by project name only.
Summary
| Step | Command / file |
|---|---|
| Fetch EU data | PYTHONPATH=. uv run python -m eu_playwright.cli org-details --fetch-details |
| JSON output | tmp/downloaded_data/eu_playwright/org_details_<slug>_results.json |
| Enrich dates (optional) | uv run python scripts/eu_portal/update_org_details_dates_from_saved_pages.py |
| Import into DB | uv run python scripts/eu_portal/seed_eu_projects_from_org_details.py |
This gives you EU projects in the app with long name and/or acronym, begin/end dates, and (when available) awarded person-months on a single task.