MCP Hub
Back to servers

legal-data-hunter

Validation Failed

Search 18M+ legal documents worldwide — case law, legislation, and doctrine across 110+ countries.

Registry
Stars
91
Forks
11
Updated
Apr 5, 2026
Validated
Apr 7, 2026

Validation Error:

Timeout after 45s

Quick Install

npx -y legal-data-hunter

World Wide Law

Open-source collection scripts for open legal data from 110+ countries.

Every country publishes its laws, court decisions, and regulations online -- but in different formats, behind different APIs, with different access rules. World Wide Law is building the open infrastructure to collect, normalize, and make all of it searchable.

All sources in this repository are open data -- publicly available legal information from official government portals, APIs, and bulk download endpoints. We always prefer API and bulk access over web extraction.

Live Dashboard & API

What's Here

This repository contains 960+ collection scripts across 110+ countries that download and normalize open legal data from government portals worldwide. Each script follows a standard interface so that any developer can run, test, or improve it. Some sources are marked as blocked (CAPTCHA, IP restrictions, etc.) -- their scripts are included so developers can review and potentially contribute fixes.

sources/
  FR/LegifranceCodes/     # French consolidated legal codes (API)
  DE/GesetzeImInternet/   # German federal laws (bulk XML)
  IT/NormattivaLegislation/ # Italian legislation (API)
  ES/BOE/                 # Spanish official gazette (API)
  ... (110+ countries)

Quick Start

# Clone the repo
git clone https://github.com/worldwidelaw/legal-sources.git
cd legal-sources

# Install dependencies
pip install -r requirements.txt

# Check project status
python runner.py status

# Test a specific source
python runner.py sample FR/LegifranceCodes

# See what needs work
python runner.py next

How It Works

Per-Source Structure

Every source lives in sources/{COUNTRY_CODE}/{SourceName}/ and contains:

FilePurpose
bootstrap.pyCollection script -- implements fetch_all(), fetch_updates(), normalize()
config.yamlSource metadata, access method, rate limits, schema
sample/10+ sample documents for validation
README.mdDocumentation about the data source
.env.templateRequired API keys or credentials (if any)
retrieve.pyReference resolver (e.g., "article 1240 code civil" -> document)

Two Data Models

Legislation (mutable): Laws get amended. Same ID, new content. Strategy: upsert with version tracking.

Case law (immutable): Court decisions don't change after publication. Strategy: append-only with dedup.

Standard Output Schema

Every script normalizes documents to a common schema:

  • _id -- Unique identifier
  • _source -- Source identifier (e.g., FR/LegifranceCodes)
  • _type -- legislation or case_law
  • title -- Document title
  • text -- Full text content
  • date -- Publication or decision date
  • url -- Link to the original source

Architecture

legal-sources/
  manifest.yaml          # Master inventory: all sources + status
  runner.py              # CLI: run, test, and manage collection scripts
  common/                # Shared libraries
    base_scraper.py        Base class all scripts inherit from
    http_client.py         HTTP client with retries + caching
    rate_limiter.py        Token bucket rate limiter
    storage.py             JSONL storage with deduplication
    validators.py          Schema validation
  templates/             # Templates for new sources
    scraper_template.py    Boilerplate for bootstrap.py
    config_template.yaml   Boilerplate for config.yaml
    retrieve_template.py   Boilerplate for retrieve.py
  sources/               # One directory per data source
    {CC}/{Source}/          (see per-source structure above)

Coverage

RegionCountriesSources
EU Member StatesAT, BE, BG, CY, CZ, DE, DK, EE, ES, FI, FR, GR, HR, HU, IE, IT, LT, LU, LV, MT, NL, PL, PT, RO, SE, SI, SK130+
EFTA / EEACH, NO, IS, LI10+
Council of EuropeUK, TR, UA, GE, AM, AZ, MD20+
Western BalkansRS, BA, ME, AL, MK, XK15+
Latin AmericaAR, BR, CL, CO, MX, PE25+
Asia-PacificAU, JP, KR, NZ, SG, TW, IN30+
Middle East & AfricaEG, MA, ZA, NG, KE, TN20+
OtherUS, CA, and more15+

Track live progress on the dashboard.

Contributing

We welcome contributions from developers, legal researchers, and especially governments who want their open legal data included.

Who can contribute?

You are...How you can help
DeveloperBuild or fix collection scripts, add retrieve scripts, improve tooling
Government official / jurisdiction leadTell us about your country's legal data portals — no coding needed
Lawyer / legal researcherValidate data quality, improve legal reference resolution, flag coverage gaps
AnyoneReport data quality issues, broken sources, or coverage gaps

Submit a data source (no coding required):

Fix or improve a collection script:

Report a problem:

Good first issues: Browse label:good-first-issue for approachable starting points.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). You are free to use, modify, and distribute this software, provided that any modified versions made available over a network also make their source code available under the same license.

Commercial Licensing: If you wish to use this software without the AGPL-3.0 obligations (e.g., in a proprietary product or SaaS), commercial licenses are available. Contact zacharie@goodlegal.fr for details.

Reviews

No reviews yet

Sign in to write a review