World Wide Law
Open-source collection scripts for open legal data from 110+ countries.
Every country publishes its laws, court decisions, and regulations online -- but in different formats, behind different APIs, with different access rules. World Wide Law is building the open infrastructure to collect, normalize, and make all of it searchable.
All sources in this repository are open data -- publicly available legal information from official government portals, APIs, and bulk download endpoints. We always prefer API and bulk access over web extraction.
Live Dashboard & API
- Dashboard: legaldatahunter.com -- track coverage, explore sources, submit feedback
- Search API: Available at legaldatahunter.com -- search across 16M+ indexed legal documents
What's Here
This repository contains 960+ collection scripts across 110+ countries that download and normalize open legal data from government portals worldwide. Each script follows a standard interface so that any developer can run, test, or improve it. Some sources are marked as blocked (CAPTCHA, IP restrictions, etc.) -- their scripts are included so developers can review and potentially contribute fixes.
sources/
FR/LegifranceCodes/ # French consolidated legal codes (API)
DE/GesetzeImInternet/ # German federal laws (bulk XML)
IT/NormattivaLegislation/ # Italian legislation (API)
ES/BOE/ # Spanish official gazette (API)
... (110+ countries)
Quick Start
# Clone the repo
git clone https://github.com/worldwidelaw/legal-sources.git
cd legal-sources
# Install dependencies
pip install -r requirements.txt
# Check project status
python runner.py status
# Test a specific source
python runner.py sample FR/LegifranceCodes
# See what needs work
python runner.py next
How It Works
Per-Source Structure
Every source lives in sources/{COUNTRY_CODE}/{SourceName}/ and contains:
| File | Purpose |
|---|---|
bootstrap.py | Collection script -- implements fetch_all(), fetch_updates(), normalize() |
config.yaml | Source metadata, access method, rate limits, schema |
sample/ | 10+ sample documents for validation |
README.md | Documentation about the data source |
.env.template | Required API keys or credentials (if any) |
retrieve.py | Reference resolver (e.g., "article 1240 code civil" -> document) |
Two Data Models
Legislation (mutable): Laws get amended. Same ID, new content. Strategy: upsert with version tracking.
Case law (immutable): Court decisions don't change after publication. Strategy: append-only with dedup.
Standard Output Schema
Every script normalizes documents to a common schema:
_id-- Unique identifier_source-- Source identifier (e.g.,FR/LegifranceCodes)_type--legislationorcase_lawtitle-- Document titletext-- Full text contentdate-- Publication or decision dateurl-- Link to the original source
Architecture
legal-sources/
manifest.yaml # Master inventory: all sources + status
runner.py # CLI: run, test, and manage collection scripts
common/ # Shared libraries
base_scraper.py Base class all scripts inherit from
http_client.py HTTP client with retries + caching
rate_limiter.py Token bucket rate limiter
storage.py JSONL storage with deduplication
validators.py Schema validation
templates/ # Templates for new sources
scraper_template.py Boilerplate for bootstrap.py
config_template.yaml Boilerplate for config.yaml
retrieve_template.py Boilerplate for retrieve.py
sources/ # One directory per data source
{CC}/{Source}/ (see per-source structure above)
Coverage
| Region | Countries | Sources |
|---|---|---|
| EU Member States | AT, BE, BG, CY, CZ, DE, DK, EE, ES, FI, FR, GR, HR, HU, IE, IT, LT, LU, LV, MT, NL, PL, PT, RO, SE, SI, SK | 130+ |
| EFTA / EEA | CH, NO, IS, LI | 10+ |
| Council of Europe | UK, TR, UA, GE, AM, AZ, MD | 20+ |
| Western Balkans | RS, BA, ME, AL, MK, XK | 15+ |
| Latin America | AR, BR, CL, CO, MX, PE | 25+ |
| Asia-Pacific | AU, JP, KR, NZ, SG, TW, IN | 30+ |
| Middle East & Africa | EG, MA, ZA, NG, KE, TN | 20+ |
| Other | US, CA, and more | 15+ |
Track live progress on the dashboard.
Contributing
We welcome contributions from developers, legal researchers, and especially governments who want their open legal data included.
Who can contribute?
| You are... | How you can help |
|---|---|
| Developer | Build or fix collection scripts, add retrieve scripts, improve tooling |
| Government official / jurisdiction lead | Tell us about your country's legal data portals — no coding needed |
| Lawyer / legal researcher | Validate data quality, improve legal reference resolution, flag coverage gaps |
| Anyone | Report data quality issues, broken sources, or coverage gaps |
Submit a data source (no coding required):
- Open a "New Source" issue and tell us about your country's legal data portal
Fix or improve a collection script:
- See CONTRIBUTING.md for the full guide
Report a problem:
- Data quality issue -- missing or incorrect data
- Bug report -- broken script
Good first issues: Browse label:good-first-issue for approachable starting points.
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). You are free to use, modify, and distribute this software, provided that any modified versions made available over a network also make their source code available under the same license.
Commercial Licensing: If you wish to use this software without the AGPL-3.0 obligations (e.g., in a proprietary product or SaaS), commercial licenses are available. Contact zacharie@goodlegal.fr for details.