Data Preparation

The example ships with everything needed to launch the inversion immediately. This page explains the data files, the travel-time CSV schema, and how to optionally build a 3-D initial model from CSEM.

Travel-time data (`src_rec_data_wus.csv`)

The travel-time table uses the standard SurfATT CSV schema with station and event elevation columns:


tt,staname,stla,stlo,stel,evtname,evla,evlo,evel,period,weight,dist,vel
48.8896,109C-TA,32.889,-117.105,0,BBR-CI,34.262,-116.921,0,5.0,1,153.22,3.134
62.2573,109C-TA,32.889,-117.105,0,BC3-CI,33.655,-115.454,0,5.0,1,175.69,2.822
52.048,109C-TA,32.889,-117.105,0,BEL-CI,34.001,-115.998,0,5.0,1,160.62,3.086
...

Column	Description
`tt`	Travel time (s)
`staname`	Receiver station code (e.g. `109C-TA`)
`stla` / `stlo` / `stel`	Receiver latitude, longitude, elevation (m)
`evtname`	Virtual-source station code
`evla` / `evlo` / `evel`	Source latitude, longitude, elevation (m)
`period`	Rayleigh-wave period (s)
`weight`	Measurement weight
`dist`	Inter-station distance (km)
`vel`	Measured phase velocity (km/s)

The measurements come from USArray-TA and co-located regional networks (CI, TA, US, etc.). The shortest period is 5 s, sensitive to the very upper crust; longer periods extend to several tens of seconds and sample the Moho and uppermost mantle.

Quick summary


import pandas as pd
 
df = pd.read_csv("src_rec_data_wus.csv")
print(f"Total measurements : {len(df):,}")
print(f"Unique stations    : {pd.concat([df['staname'], df['evtname']]).nunique()}")
print(f"Period range       : {df['period'].min():.1f} – {df['period'].max():.1f} s")
print(f"Periods            : {sorted(df['period'].unique().tolist())}")
print(f"Velocity range     : {df['vel'].min():.2f} – {df['vel'].max():.2f} km/s")
print(f"Distance range     : {df['dist'].min():.0f} – {df['dist'].max():.0f} km")

Running this on the bundled src_rec_data_wus.csv prints:


Total measurements : 334,554
Unique stations    : 689
Period range       : 5.0 – 40.0 s
Periods            : [5.0, 6.0, 8.0, 10.0, 12.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0]
Velocity range     : 1.36 – 4.10 km/s
Distance range     : 42 – 600 km

So the dataset delivers roughly 334 k phase-velocity picks among 689 stations across 11 discrete periods (5, 6, 8, 10, 12, 15, 20, 25, 30, 35, 40 s), with inter-station distances of 42–600 km.

Optional: build a 3-D initial model from CSEM

The default configuration uses init_model_type: 1 (1-D inversion of average travel times) as the starting model, which converges robustly without extra setup. To start from a higher-quality 3-D reference, convert the bundled CSEM NetCDF to HDF5:


import h5py
import numpy as np
from scipy.io import netcdf_file
 
 
def read_nc(fname):
    f = netcdf_file(fname, mode="r", mmap=False)
    try:
        vsv = np.asarray(f.variables["vsv"][:])
        vsh = np.asarray(f.variables["vsh"][:])
        x   = np.asarray(f.variables["longitude"][:])
        y   = np.asarray(f.variables["latitude"][:])
        z   = np.asarray(f.variables["depth"][:])
    finally:
        f.close()
    # Voigt-average isotropic Vs
    vs = np.sqrt((2.0 * vsv**2 + vsh**2) / 3.0).T
    return x, y, z, vs
 
 
def write_h5(fname, x, y, z, vs):
    with h5py.File(fname, "w") as f:
        f.create_dataset("x",  data=x)
        f.create_dataset("y",  data=y)
        f.create_dataset("z",  data=z)
        f.create_dataset("vs", data=vs)
 
 
if __name__ == "__main__":
    x, y, z, vs = read_nc("csem.nc")
    write_h5("csem.h5", x, y, z, vs)

Then switch init_model_type to 2 in input_params.yml:


model:
  init_model_type: 2
  init_model_path: csem.h5

CSEM is distributed in NetCDF classic format (CDF1/CDF2), which is not HDF5-compatible — use scipy.io.netcdf_file for the read step rather than h5py.

Data Preparation

Travel-time data (src_rec_data_wus.csv)

Quick summary

Optional: build a 3-D initial model from CSEM

Travel-time data (`src_rec_data_wus.csv`)