Skip to Content

Data Preparation

The example ships with everything needed to launch the inversion immediately. This page explains the data files, the travel-time CSV schema, and how to optionally build a 3-D initial model from CSEM.

Travel-time data (src_rec_data_wus.csv)

The travel-time table uses the standard SurfATT CSV schema with station and event elevation columns:

tt,staname,stla,stlo,stel,evtname,evla,evlo,evel,period,weight,dist,vel 48.8896,109C-TA,32.889,-117.105,0,BBR-CI,34.262,-116.921,0,5.0,1,153.22,3.134 62.2573,109C-TA,32.889,-117.105,0,BC3-CI,33.655,-115.454,0,5.0,1,175.69,2.822 52.048,109C-TA,32.889,-117.105,0,BEL-CI,34.001,-115.998,0,5.0,1,160.62,3.086 ...
ColumnDescription
ttTravel time (s)
stanameReceiver station code (e.g. 109C-TA)
stla / stlo / stelReceiver latitude, longitude, elevation (m)
evtnameVirtual-source station code
evla / evlo / evelSource latitude, longitude, elevation (m)
periodRayleigh-wave period (s)
weightMeasurement weight
distInter-station distance (km)
velMeasured phase velocity (km/s)

The measurements come from USArray-TA and co-located regional networks (CI, TA, US, etc.). The shortest period is 5 s, sensitive to the very upper crust; longer periods extend to several tens of seconds and sample the Moho and uppermost mantle.

Quick summary

import pandas as pd df = pd.read_csv("src_rec_data_wus.csv") print(f"Total measurements : {len(df):,}") print(f"Unique stations : {pd.concat([df['staname'], df['evtname']]).nunique()}") print(f"Period range : {df['period'].min():.1f}{df['period'].max():.1f} s") print(f"Periods : {sorted(df['period'].unique().tolist())}") print(f"Velocity range : {df['vel'].min():.2f}{df['vel'].max():.2f} km/s") print(f"Distance range : {df['dist'].min():.0f}{df['dist'].max():.0f} km")

Running this on the bundled src_rec_data_wus.csv prints:

Total measurements : 334,554 Unique stations : 689 Period range : 5.0 – 40.0 s Periods : [5.0, 6.0, 8.0, 10.0, 12.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0] Velocity range : 1.36 – 4.10 km/s Distance range : 42 – 600 km

So the dataset delivers roughly 334 k phase-velocity picks among 689 stations across 11 discrete periods (5, 6, 8, 10, 12, 15, 20, 25, 30, 35, 40 s), with inter-station distances of 42–600 km.

Optional: build a 3-D initial model from CSEM

The default configuration uses init_model_type: 1 (1-D inversion of average travel times) as the starting model, which converges robustly without extra setup. To start from a higher-quality 3-D reference, convert the bundled CSEM NetCDF to HDF5:

import h5py import numpy as np from scipy.io import netcdf_file def read_nc(fname): f = netcdf_file(fname, mode="r", mmap=False) try: vsv = np.asarray(f.variables["vsv"][:]) vsh = np.asarray(f.variables["vsh"][:]) x = np.asarray(f.variables["longitude"][:]) y = np.asarray(f.variables["latitude"][:]) z = np.asarray(f.variables["depth"][:]) finally: f.close() # Voigt-average isotropic Vs vs = np.sqrt((2.0 * vsv**2 + vsh**2) / 3.0).T return x, y, z, vs def write_h5(fname, x, y, z, vs): with h5py.File(fname, "w") as f: f.create_dataset("x", data=x) f.create_dataset("y", data=y) f.create_dataset("z", data=z) f.create_dataset("vs", data=vs) if __name__ == "__main__": x, y, z, vs = read_nc("csem.nc") write_h5("csem.h5", x, y, z, vs)

Then switch init_model_type to 2 in input_params.yml:

model: init_model_type: 2 init_model_path: csem.h5

CSEM is distributed in NetCDF classic format (CDF1/CDF2), which is not HDF5-compatible — use scipy.io.netcdf_file for the read step rather than h5py.

Last updated on