SPEC-03 — Adapters & Validation: Tooling

16/04/2026

1. Introduction

Telemachus data originates from heterogeneous sources — commercial telematics devices, research platforms, smartphones, and simulators. Adapters transform raw provider data into Telemachus-conformant parquet files. Validators verify that the output conforms to SPEC-01 and SPEC-02.

This specification consolidates RFC-0005 (Adapter Architecture), RFC-0007 (Validation Framework), RFC-0009 (RS3 Integration), and incorporates the industrial mapping from RFC-0002 as an appendix.

1.1 Scope Separation

graph LR
    subgraph PUBLIC["telemachus-py (Open Source)"]
        A_OPEN["Adapters: Open datasets\nAEGIS, PVS, STRIDE, UAH"]
        VAL["Validation engine\nschema + semantic checks"]
        IO["I/O layer\nread/write parquet, JSON"]
    end

    subgraph PRIVATE["Private Pipeline"]
        A_PROP["Adapters: Proprietary\ncommercial devices"]
        PIPE["Processing pipeline\n(implementation-specific)"]
        METHODS["Processing methods\n(implementation-specific)"]
    end

    subgraph PRODUCT["Production Pipeline"]
        ORCH["Orchestration\nAPI, scheduling"]
        CONN["Client connectors"]
    end

    PUBLIC --> PRIVATE --> PRODUCT

    style PUBLIC fill:#e8f5e9,stroke:#2e7d32
    style PRIVATE fill:#e3f2fd,stroke:#1565c0
    style PRODUCT fill:#f3e5f5,stroke:#6a1b9a

2. Adapter Architecture

2.1 What an Adapter Does

An adapter converts raw data from a specific provider into a Telemachus-conformant pandas DataFrame with:

  • Correct column names (SPEC-01 §2)
  • Correct units (SPEC-01 §5)
  • A valid manifest.yaml (SPEC-02)
graph LR
    RAW["Raw Data\n(CSV, JSON, MQTT,\nParquet, DuckDB)"] --> ADAPTER["Adapter\n(parse + convert\n+ validate)"]
    ADAPTER --> DF["pandas DataFrame\nTelemachus columns\nSI units"]
    ADAPTER --> MAN["manifest.yaml\nSPEC-02 compliant"]
    DF --> PQ["Telemachus Parquet\n(zstd compressed)"]

    style RAW fill:#fff3e0,stroke:#e65100
    style ADAPTER fill:#e3f2fd,stroke:#1565c0
    style DF fill:#e8f5e9,stroke:#2e7d32
    style MAN fill:#fff9c4,stroke:#f9a825
    style PQ fill:#bbdefb,stroke:#1565c0

2.2 Adapter Interface

Every adapter is a Python function (not a class hierarchy). The interface is intentionally simple:

def load(source_path: Path, **kwargs) -> pd.DataFrame:
    """
    Load raw data and return a Telemachus-conformant DataFrame.

    The returned DataFrame has columns from SPEC-01 §2
    with correct SI units. Extra provider-specific columns
    use the x_<source>_<field> convention.
    """
    ...

Adapters MAY also provide:

def manifest(source_path: Path) -> dict:
    """Return a SPEC-02 manifest dict for this dataset."""
    ...

def convert(source_path: Path, output_dir: Path) -> Path:
    """Convert raw data to Telemachus parquet + manifest.yaml."""
    ...

2.3 Module Layout

telemachus/
└── adapters/
    ├── __init__.py          # registry of available adapters
    ├── aegis.py             # AEGIS (Zenodo, Austria)
    ├── pvs.py               # PVS (Kaggle, Brazil)
    ├── stride.py            # STRIDE (Figshare, Bangladesh)
    └── uah.py               # UAH DriveSet (Spain)

Proprietary adapters live in a private pipeline module, not telemachus-py:

private-pipeline/
└── adapters/
    ├── commercial.py        # Commercial device adapters
    ├── prototype.py         # Experimental prototypes
    └── ...

2.4 Adapter Pipeline

graph TD
    subgraph FETCH["1. Fetch / Read"]
        CSV["CSV files\n(AEGIS, PVS)"]
        SENSOR["Sensor Logger\n(STRIDE)"]
        MQTT["MQTT/REST\n(IoT gateway)"]
        SIM["Simulator\n(RS3)"]
    end

    subgraph PARSE["2. Parse & Rename"]
        COL["Map raw columns\n→ Telemachus names"]
    end

    subgraph CONVERT["3. Unit Conversion"]
        UNITS["G → m/s²\ndeg/s → rad/s\nkm/h → m/s\nNMEA → decimal°"]
    end

    subgraph MERGE["4. Multi-Rate Merge"]
        ASOF["merge_asof()\nIMU @ high rate\nGPS @ low rate"]
    end

    subgraph TAG["5. Metadata Tagging"]
        FRAME["acc_frame from\nmanifest.acc_periods"]
        TRIP["trip_id from\nsegmentation"]
    end

    subgraph VALIDATE["6. Validate"]
        CHECK["tele.validate(df, 'basic')"]
    end

    FETCH --> PARSE --> CONVERT --> MERGE --> TAG --> VALIDATE

    style FETCH fill:#fff3e0,stroke:#e65100
    style PARSE fill:#fff9c4,stroke:#f9a825
    style CONVERT fill:#e8f5e9,stroke:#2e7d32
    style MERGE fill:#e3f2fd,stroke:#1565c0
    style TAG fill:#f3e5f5,stroke:#6a1b9a
    style VALIDATE fill:#c8e6c9,stroke:#2e7d32

3. Adapter Specifications

3.1 AEGIS Adapter

PropertyValue
SourceZenodo 820576, 6 CSV files
Raw unitsAccel: G-force, Gyro: deg/s, GPS: NMEA DDMM.MMMM
Conversions× 9.80665, × π/180, NMEA→decimal
Multi-rate mergeAccel+Gyro (24 Hz) ← GPS (5 Hz) via merge_asof
Output columnsts, lat, lon, speed_mps, altitude_gps_m, ax/ay/az_mps2, gx/gy/gz_rad_s, speed_obd_mps (opt), device_id, trip_id

3.2 PVS Adapter

PropertyValue
SourceKaggle, combined GPS+MPU CSV per trip
Raw unitsAccel: m/s² (native), Gyro: deg/s, Magneto: µT, GPS: decimal degrees
ConversionsGyro: × π/180
Parametersplacement: dashboard / above_suspension / below_suspension; side: left / right
Output columnsts, lat, lon, speed_mps, altitude_gps_m, hdop, n_satellites, ax/ay/az_mps2, gx/gy/gz_rad_s, mx/my/mz_uT, device_id, trip_id

3.3 STRIDE Adapter

PropertyValue
SourceFigshare, 11 CSV files per session
Raw unitsAccel: m/s² (TotalAcceleration), Gyro: rad/s (native), Magneto: µT, GPS: decimal degrees
ConversionsNone (all native SI)
Multi-rate mergeAccel (100 Hz) ← GPS (1 Hz) ← Gyro (100 Hz) via merge_asof
Parameterscategory: driving / anomalies / all; with_gyro: bool
Output columnsts, lat, lon, speed_mps, altitude_gps_m, heading_deg, h_accuracy_m, ax/ay/az_mps2, gx/gy/gz_rad_s, mx/my/mz_uT, device_id, trip_id

3.4 RS3 Adapter (Synthetic)

PropertyValue
SourceRoadSimulator3 CSV export
Raw unitsAll already in SI
ConversionsNone
Ground truthroad_type, event, target_speed exported as x_rs3_* extra columns

4. Validation Framework

4.1 Validation Levels

LevelChecksUse Case
basicMandatory columns for declared profile present, correct types, value ranges (lat/lon bounds, speed >= 0)Quick conformance
strictAll of basic + monotonic ts, AccPeriod gravity check (profiles imu/full). NaN is allowed in GNSS columns between ticks (multi-rate convention) but at least one non-NaN GPS fix MUST existResearch-grade
manifestSPEC-02 §5 rules (required fields, acc_periods consistency, sensor config)Manifest-only check
fullstrict + manifest + cross-validation (manifest vs parquet agreement)Publication-ready

4.2 Validation API

import telemachus as tele

# Validate a DataFrame
report = tele.validate(df, level="basic")
print(report.ok)        # True / False
print(report.errors)    # list of error messages
print(report.warnings)  # list of warnings

# Validate a manifest file
report = tele.validate_manifest("path/to/manifest.yaml")

# Validate a complete dataset (parquet + manifest)
report = tele.validate_dataset("path/to/dataset/", level="full")

4.3 CLI

# Validate a dataset directory
tele validate path/to/dataset/ --level full

# Validate manifest only
tele validate path/to/manifest.yaml --manifest-only

# Quick check on a parquet file
tele validate path/to/data.parquet --level basic

# Output as JSON (for CI pipelines)
tele validate path/to/dataset/ --json

4.4 Validation Rules Summary

graph TD
    subgraph SCHEMA["Schema Checks"]
        S1["Mandatory columns\npresent?"]
        S2["Column types\ncorrect?"]
        S3["Value ranges\nlat ∈ [-90,90]\nlon ∈ [-180,180]\nspeed ≥ 0"]
    end

    subgraph SEMANTIC["Semantic Checks"]
        M1["ts monotonically\nincreasing?"]
        M2["No NaN in\nmandatory fields?"]
        M3["Gyro/magneto\nall-or-nothing?"]
        M4["No excluded columns\nin file?"]
    end

    subgraph PHYSICS["Physics Checks"]
        P1["AccPeriod frame\nmatches |a| at rest?"]
        P2["Speed plausible\n< 100 m/s?"]
    end

    subgraph MANIFEST_V["Manifest Checks"]
        V1["Required fields\npresent?"]
        V2["acc_periods\nconsistent?"]
        V3["sensors.rate_hz\npositive?"]
        V4["Device inheritance\nresolvable?"]
    end

    S1 --> S2 --> S3 --> M1 --> M2 --> M3 --> M4 --> P1 --> P2
    V1 --> V2 --> V3 --> V4

    style SCHEMA fill:#e8f5e9,stroke:#2e7d32
    style SEMANTIC fill:#e3f2fd,stroke:#1565c0
    style PHYSICS fill:#fff3e0,stroke:#e65100
    style MANIFEST_V fill:#fff9c4,stroke:#f9a825

4.5 Exit Codes

CodeMeaning
0Validation successful
1Validation failed (errors detected)
2Manifest missing or corrupted
3Schema invalid or unavailable

5. Dataset Generation Workflow

5.1 Converting Open Data

The standard workflow to generate a Telemachus dataset from an Open source:

graph TD
    DL["1. Download\nraw data from\nZenodo/Kaggle/Figshare"] --> INSPECT["2. Inspect\ncolumn names, units\nsensor rates"]
    INSPECT --> ADAPT["3. Run adapter\ntele convert aegis\n/path/to/raw --outdir out/"]
    ADAPT --> VALID["4. Validate\ntele validate out/\n--level full"]
    VALID --> REVIEW["5. Review\nmanifest.yaml\nacc_periods, sensors"]
    REVIEW --> PUBLISH["6. Publish\nZenodo DOI\nor git commit"]

    style DL fill:#fff3e0,stroke:#e65100
    style INSPECT fill:#fff9c4,stroke:#f9a825
    style ADAPT fill:#e3f2fd,stroke:#1565c0
    style VALID fill:#e8f5e9,stroke:#2e7d32
    style REVIEW fill:#f3e5f5,stroke:#6a1b9a
    style PUBLISH fill:#bbdefb,stroke:#1565c0

5.2 CLI for Conversion

# Convert an Open dataset to Telemachus parquet + manifest
tele convert aegis /path/to/aegis/csvs --outdir datasets/aegis/
tele convert pvs /path/to/pvs/trips --outdir datasets/pvs/ --placement dashboard
tele convert stride /path/to/stride/road_data --outdir datasets/stride/ --category driving

# Validate the result
tele validate datasets/aegis/ --level full

# Inspect dataset info
tele info datasets/aegis/manifest.yaml

6. Open Sources Matrix

Cross-reference of available columns per Open dataset, to help users choose the right dataset for their use case.

Column GroupAEGISPVSSTRIDEUAH
GPS lat, lon5 Hz1 Hz1 Hz1 Hz
GPS speed_mpsderived1 Hz1 Hz1 Hz
GPS heading_deg1 Hz
GPS altitude_gps_m5 Hz1 Hz1 Hz
GPS hdop1 Hz
GPS h_accuracy_m1 Hz
GPS n_satellites1 Hz
Accel ax/ay/az_mps224 Hz100 Hz100 Hz10 Hz
Gyro gx/gy/gz_rad_s24 Hz100 Hz100 Hz
Magneto mx/my/mz_uT100 Hz100 Hz
OBD speed_obd_mpsPID 0x0D
Framerawrawrawraw
Ground truth gyroyesyesyes
CountryAustriaBrazilBangladeshSpain
LicenseCC-BY-4.0CC-BY-NC-ND-4.0CC-BY-4.0Academic
Republishableyesno (ND)yescase-by-case

Appendix A — Industrial API Mapping

Cross-reference of Telemachus columns with major industrial telematics APIs (based on their public documentation). This table guides future adapter development.

Telemachus ColumnSamsaraWebfleet (TomTom)Geotab
tstimegpstimedateTime
latlatitudelatlatitude
lonlongitudelonlongitude
speed_mpsspeed (km/h)speed (km/h)speed (km/h)
heading_degbearingDegheadingbearing
ignitionengineStateignitionignition
odometer_modometerMetersmileage (km)odometer
rpmengineRpmengineRpm

Note: These fleet management APIs typically provide enriched data (aggregated, post-processed). They rarely expose raw IMU. Adapters for these providers would produce GPS + Vehicle I/O datasets without accelerometer data. Other commercial device adapters are documented in their respective private modules.


7. References

  • SPEC-01: Telemachus Record Format — column definitions
  • SPEC-02: Dataset Manifest — metadata schema
  • Superseded: RFC-0002 (Comparative APIs), RFC-0005 (Adapter Architecture), RFC-0007 (Validation Framework), RFC-0009 (RS3 Integration)

End of SPEC-03.

Réseau4 sortants7 entrants

Sources

  • SPEC-01 — SPEC-01 — Telemachus Record Format: Open Telematics Data
  • SPEC-02 — SPEC-02 — Dataset Manifest: Canonical File-Level Metadata
  • SPEC-04 — SPEC-04 — Governance & Versioning
  • T001 — Telemachus RFCs & Specifications — White Paper

Cité par

  • RFC-0005 — RFC-0005 — Adapter Architecture & Provider Modules
  • RFC-0007 — RFC-0007 — Validation Framework & CLI Rules
  • RFC-0009 — RFC-0009 — RS3 Integration Pipeline
  • SPEC-01 — SPEC-01 — Telemachus Record Format: Open Telematics Data
  • SPEC-02 — SPEC-02 — Dataset Manifest: Canonical File-Level Metadata
  • SPEC-04 — SPEC-04 — Governance & Versioning
  • T001 — Telemachus RFCs & Specifications — White Paper

Références

Aucune référence