# Election Data Indices Documentation

This directory contains comprehensive indices for all processed election data.

## Index Files

### 1. index_results.json
**Purpose**: Complete index of all election results by year, position, and state.
**Structure**:
```json
{
  "2002": {
    "president": {
      "SP": {
        "totalCities": 645,
        "cities": ["SP_SAO_PAULO", "SP_CAMPINAS", ...],
        "path": "results/2002/president/SP/"
      }
    }
  }
}
```
**Use Cases**: 
- Find all cities with results for a specific year/position/state
- Get file paths for accessing specific result data
- Count total cities per state/position

### 2. index_candidates_{position}.json
**Purpose**: Index of all candidates by position and year.
**Files**: One file per position (e.g., index_candidates_president.json)
**Structure**:
```json
{
  "position": "president",
  "totalYears": 1,
  "totalCandidates": 6,
  "years": {
    "2002": [
      {
        "filename": "PT_LULA.json",
        "path": "candidates/2002/president/",
        "state": "BR"
      }
    ]
  }
}
```
**Use Cases**:
- Find all candidates for a specific position
- Get file paths for candidate data
- Track candidates across years

### 3. index_years.jsonl
**Purpose**: Summary of all years covered by the dataset.
**Format**: JSON Lines (one JSON object per line)
**Structure**:
```json
{"year": 2002, "electionType": "general", "positions": ["president", "governor"], "states": ["SP", "RJ"], "totalCities": 5565}
```
**Use Cases**:
- Get overview of available years
- Understand election types per year
- Count total coverage

### 4. index_cities.jsonl
**Purpose**: Complete list of all cities covered across all years and positions.
**Format**: JSON Lines (one JSON object per line)
**Structure**:
```json
{"year": 2002, "state": "SP", "city": "SAO_PAULO", "position": "president", "electionType": "general"}
```
**Use Cases**:
- Find all cities in a specific state/year
- Track city coverage across positions
- Geographic analysis

### 5. index_positions.jsonl
**Purpose**: Complete list of all positions covered across all years.
**Format**: JSON Lines (one JSON object per line)
**Structure**:
```json
{"year": 2002, "position": "president", "dataType": "results", "electionType": "general", "description": "President of Brazil"}
```
**Use Cases**:
- Understand available positions per year
- Track position coverage across years
- Election type analysis

## Usage Examples

### Find all presidential results for São Paulo in 2002:
```python
import json
with open('indices/index_results.json') as f:
    results = json.load(f)
sp_president_2002 = results['2002']['president']['SP']
print(f"Cities: {sp_president_2002['totalCities']}")
```

### Get all candidates for a position:
```python
import json
with open('indices/index_candidates_president.json') as f:
    candidates = json.load(f)
for year, year_candidates in candidates['years'].items():
    print(f"Year {year}: {len(year_candidates)} candidates")
```

### Process all years:
```python
with open('indices/index_years.jsonl') as f:
    for line in f:
        year_data = json.loads(line)
        print(f"Year {year_data['year']}: {year_data['electionType']}")
```

## File Organization

All processed data follows this structure:
```
data/processed/
├── candidates/{year}/{position}/{state}/
├── results/{year}/{position}/{state}/
└── aggregates/{year}/{position}/
```

Indices provide efficient access to this data without scanning directories.