How to continue org-details data fetching (eu_playwright)
This howto describes how to resume fetching org-details from the EU Funding & Tenders portal when you already have partial results (e.g. 3 projects) and want to fetch the remaining projects without re-fetching existing ones.
See also: docs/plans/2026-02-02_012400-org-details-run-analysis.md.
Prerequisites
- Repo root:
uv sync --group devanduv run playwright install chromiumdone once. - Existing file:
tmp/downloaded_data/eu_playwright/org_details_<slug>_results.jsonwith at least some projects (resume loads these and skips them by URL). - Chrome or Chromium with remote debugging when using
--connect(recommended for long runs).
Resume with an existing browser (recommended)
Attaching to Chrome avoids launching a browser in the background and makes long runs (38 projects) more reliable.
1. Start Chrome with remote debugging
Use a non-default --user-data-dir (required for remote debugging).
# Linux
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
# or
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
2. Run org-details with resume and fetch-details
From the repo root:
uv run python -m eu_playwright.cli org-details --fetch-details --connect http://localhost:9222 \
-o tmp/downloaded_data/eu_playwright
- Resume (default): loads
org_details_<slug>_results.jsonfrom the out-dir and only fetches project details for URLs not already in the file. - --fetch-details: for each project card, opens the project details page and parses acronym, dates, status, participants, EU contribution, programme, etc.
- --connect: attaches to the Chrome instance above; the script will open the org-details page and click through project cards, then fetch each new project’s details.
Optional:
--debug– log URL, card counts, etc. after loading.--save-pages DIR– save each project page HTML underDIR/<session_id>/for offline parsing.--max-projects N– cap the number of project cards to process. Default 0 = load all projects (no limit).--log-file FILE– write log toFILE(default:<out-dir>/org_details_<session_id>.log).--sleep-between-queries SEC– seconds to wait between EU tender portal requests (card clicks and project-detail fetches). Default 60; all projects are still downloaded, only the next request is delayed. Use0to disable.
3. Let the run finish
The script will:
- Load existing projects from
org_details_<slug>_results.json. - Open the org-details URL and click through project cards to collect URLs (up to 38).
- For each URL not already in the file, open the project details page, parse it, and append to the result.
- Write the JSON file after each new project so an interrupt (Ctrl+C) does not lose progress.
Allow 10–30 minutes for a full run depending on network and page load.
Output
- JSON:
tmp/downloaded_data/eu_playwright/org_details_<slug>_results.json(updated after each new project). - Log:
tmp/downloaded_data/eu_playwright/org_details_<session_id>.log.
Headless run (no browser attach)
If you prefer not to attach to a browser:
uv run python -m eu_playwright.cli org-details --fetch-details \
-o tmp/downloaded_data/eu_playwright
Resume works the same. In some environments (e.g. CI or no display), headless may hit timeouts or EPIPE; use --connect for long runs when possible.
Troubleshooting
-
“Could not attach to browser”
Start Chrome with--remote-debugging-port=9222and--user-data-dir=...first, then run the CLI with--connect http://localhost:9222. -
EPIPE or timeout
Often due to headless browser in a restricted environment. Run locally with--connectand an existing Chrome window. -
Only 3 projects in file after run
Either the run was interrupted before more projects were fetched, or the click loop did not see all cards (page not fully loaded). Re-run with--resumeand--connect; existing 3 are skipped, remaining URLs are fetched.
See also
eu_playwright/README.md– CLI overview.docs/plans/2026-02-02_012400-org-details-run-analysis.md– analysis of a previous run and recommendations.