signalex

VMS Regulatory Intelligence Platform

Automated scraping, classification, and daily digest for vitamins, minerals, and supplements regulatory signals from global health authorities.

Architecture

main.py
  └── scheduler/jobs.py          APScheduler — two recurring jobs
        ├── scrape_and_classify  every 6 h
        │     ├── scrapers/tga.py      TGA (Australia) — ARTG + safety alerts
        │     ├── scrapers/fda.py      FDA (USA) — RSS feed + NDI docket
        │     ├── classifier/claude.py  Claude API classification
        │     └── storage/signals.py   TinyDB persistence
        └── send_digest          daily at 07:00 UTC
              └── digest/email_sender.py  Jinja2 render → SMTP send

Data flow

Health authority website
        │
        ▼
  [Scraper]  ── fetch_raw() ──►  RawSignal
                                  { source_id, authority, url,
                                    title, body_text, scraped_at }
        │
        ▼
  [Claude API]  classify()  ──►  ClassifiedSignal
                                  { ingredient_names, event_type,
                                    country, severity, summary,
                                    confidence, ... }
        │
        ▼
  [SignalStore]  save_batch()  ──►  signals.json  (TinyDB)
        │
        ▼  (daily cron)
  [DigestSender]  send()  ──►  HTML + text email  ──►  recipients

File map

Path	Purpose
`main.py`	Entry point; CLI flags `--scrape-now`, `--digest-now`
`config.py`	All settings; secrets via env vars
`scrapers/base.py`	Abstract `BaseScraper`; retry, dedup, `RawSignal` type
`scrapers/tga.py`	TGA ARTG listings + safety alerts
`scrapers/fda.py`	FDA dietary supplements RSS + NDI docket
`classifier/claude.py`	Claude API wrapper; `ClassifiedSignal` Pydantic model
`storage/signals.py`	TinyDB insert/query; swappable for Postgres
`digest/email_sender.py`	Group signals, render Jinja2, send SMTP
`digest/templates/`	`digest.html` + `digest.txt` email templates
`scheduler/jobs.py`	APScheduler job definitions

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Playwright browsers (needed for JS-rendered TGA pages)
playwright install chromium

cp .env.example .env
# Edit .env with your API keys and SMTP credentials

Running

# Start the scheduler (runs indefinitely)
python main.py

# One-off scrape and classify
python main.py --scrape-now

# One-off digest send
python main.py --digest-now

Adding a new health authority

Create scrapers/<authority>.py inheriting from BaseScraper.
Implement fetch_raw() returning list[RawSignal].
Add an entry to SCRAPER_CONFIG in config.py.
Register the scraper in scheduler/jobs.py under scrape_and_classify().

Classification event types

`event_type`	Meaning
`new_listing`	New product registered with the authority
`approval`	Ingredient or health claim formally approved
`ban`	Ingredient or product prohibited
`warning`	Safety advisory issued
`label_change`	Mandatory labelling update
`adverse_event`	Reported adverse event (e.g. from CAERS)
`other`	Anything not fitting the above

Extending storage

SignalStore uses TinyDB by default (no infrastructure required). To migrate to Postgres:

Replace TinyDB calls in storage/signals.py with SQLAlchemy.
Update DB_PATH in config.py to a connection string env var.
Add a migration tool (Alembic) and define the schema.

Migrations

One-off database migrations live in migrations/. Run them in order on a fresh clone before starting the API or scheduler.

Script	Purpose
`migrations/backfill_vms_domain.py`	Set `domain='vms'` on all untagged signals

python migrations/backfill_vms_domain.py

# Preview without writing:
python migrations/backfill_vms_domain.py --dry-run

Environment variables

Variable	Required	Description
`FOOD_AI_ENRICHMENT_ENABLED`	No	Default: `false`. When false, the Food pipeline does not require an AI provider key.
`AI_PROVIDER`	No	`none`, `anthropic`, or `openai`. Required only when Food AI enrichment is enabled.
`ANTHROPIC_API_KEY`	No	Claude API key. Required only when `FOOD_AI_ENRICHMENT_ENABLED=true` and `AI_PROVIDER=anthropic`.
`OPENAI_API_KEY`	No	OpenAI API key. Required only when `FOOD_AI_ENRICHMENT_ENABLED=true` and `AI_PROVIDER=openai`.
`SMTP_HOST`	No	Default: `smtp.gmail.com`
`SMTP_PORT`	No	Default: `587`
`SMTP_USER`	Yes	SMTP login username
`SMTP_PASSWORD`	Yes	SMTP login password (use an app password for Gmail)
`EMAIL_FROM`	No	Default: `SMTP_USER`
`EMAIL_RECIPIENTS`	Yes	Comma-separated list of digest recipients