Data filtering
After downloading the original data file for TomoATT Script_data_filtering/input_data/src_rec_Turkey.dat,
we need apply strict data selection criteria to retain only reliably data for inversion. The filtering process consists of four steps:
1. Select data within the study region
Because the study region is oblique, we need to rotate the study region togather with sources and stations to a new local coordinate for computation (TomoATT inversion)
Users can do it directly by
cd ./Script_data_filtering
python step1_select_data_in_region_with_rotation.py
cd ..The code is also presented below for convenience.
# load functions for data processing
import sys
sys.path.append('../../utils')
import functions_for_data as ffd
import os
# read .dat file
fname = "input_data/src_rec_Turkey.dat"
[ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)
# we only retain the earthquakes and stations within the target study region:
# center of the region: 37N,37E x-axis: [-1.3,1.3] (degree), y-axis [-2,3] degree, z-axis: 0.5-40 km.
central_lat = 37.0
central_lon = 37.0
rotation_angle = 45.0
# step 1: rotate the source and receiver locations to a local coordinate system. The center of the region is (0,0)
[ev_info_obs, st_info_obs] = ffd.rotate_src_rec(ev_info_obs,st_info_obs,central_lat,central_lon,rotation_angle)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)The function rotate_src_rec anticlockwisely rotates all sources and stations by 45.0 degree and shift the center point (central_lat, central_lon) to (0, 0).
See the figure below for the original data (left) and the data after rotation (right)
![]() | ![]() |
After rotation, it is easy to delete data outside the study region. Here, we select the study region (after rotation) to be -2.0 - 3.0 degree in lat, -1.3 to +1.3 in lon, and 0.5 - 40 km in dep.
# step 2, we only retain the earthquakes and stations within the study region: x-axis: [-1.3,1.3] (degree), y-axis [-2,3] degree, z-axis: 0.5-40 km.
lat1 = -2.0; lat2 = 3.0;
lon1 = -1.3; lon2 = 1.3;
dep1 = 0.5; dep2 = 40.0;
# limit earthquake region
ev_info_obs = ffd.limit_ev_region(ev_info_obs,lat1,lat2,lon1,lon2,dep1,dep2)
# limit station region
ev_info_obs = ffd.limit_st_region(ev_info_obs,st_info_obs,lat1,lat2,lon1,lon2)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)
# output data
out_path = "output_data"
os.makedirs(out_path,exist_ok=True)
# save src_rec_file for TomoATT
out_fname = "%s/step1_src_rec.dat"%(out_path)
ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)The intermediate data file is output as Script_data_filtering/output_data/step1_src_rec.dat
We can also rotate the data back for visualization or other purpose
# We can rotate the source and receiver locations back to the original coordinate system for plotting or other purposes.
[ev_info_obs, st_info_obs] = ffd.rotate_src_rec_reverse(ev_info_obs, st_info_obs,central_lat,central_lon,rotation_angle)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)The filtered data before rotation (left) and after rotataion (right) are illustrated below
![]() | ![]() |
2. Retain only reliable traveltimes
Users can do it directly by
cd ./Script_data_filtering
python step2_retain_reliable_traveltime.py
cd ..The original traveltime data may contain some contaminated data. To mitigate the influence of unreliable on inversion results, we perform linear regression to retain only reliable data.
First, we remove evident outliers by setting a general upper and lower bound
# load functions for data processing
import sys
sys.path.append('../../utils')
import functions_for_data as ffd
import os
# read .dat file
fname = "output_data/step1_src_rec.dat"
[ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname)
# selection parameters
# retain data satisfying: slope * dis + intercept + down < time < slope * dis + intercept + up
slope = 0.16
intercept = 0
up = 10
down = -10
# range of distance from source to receiver
dis_min = 0
dis_max = 500
ev_info_obs = ffd.fig_data_plot_remove_outliers(ev_info_obs,st_info_obs,slope,intercept,up,down,dis_min,dis_max)![]() |
Then, we retain data with epicenter distance less than 100 km and remove Pn phases, both of which can mitigate the influence of Moho topography uncertainty.
epi_dis1 = 100 # threshold of epicentral distance = 100 km
epi_dis2 = 1000000
ev_info_obs = ffd.limit_epi_dis(ev_info_obs, st_info_obs, epi_dis1, epi_dis2)
# retain given phases (P, Pg, Pb)
phase_list = ["P","Pg","Pb"]
ev_info_obs = ffd.limit_data_phase(ev_info_obs,phase_list)Finally, we do linear regression of all data and retain data with residual less tha 3 times of the standard estimated error
# Do linear regression of all data and retain data with residual < 3*SEE
[dis_obs,time_obs] = ffd.data_dis_time(ev_info_obs,st_info_obs)
(slope,intercept,SEE) = ffd.linear_regression(dis_obs,time_obs)
up = 3*SEE
down = -3*SEE
# range of distance from source to receiver
dis_min = 0
dis_max = 110
ev_info_obs = ffd.fig_data_plot_remove_outliers(ev_info_obs,st_info_obs,slope,intercept,up,down,dis_min,dis_max)
[dis_obs,time_obs] = ffd.data_dis_time(ev_info_obs,st_info_obs)
(slope_2,intercept_2,SEE_2) = ffd.linear_regression(dis_obs,time_obs)
print("The (slope,intercept,SEE) of original data is (%6.3f,%6.3f,%6.3f)"%(slope,intercept,SEE))
print("The (slope,intercept,SEE) of filtered data is (%6.3f,%6.3f,%6.3f)"%(slope_2,intercept_2,SEE_2))
# output data
out_path = "output_data"
os.makedirs(out_path,exist_ok=True)
# save data for TomoATT
out_fname = "%s/step2_src_rec.dat"%(out_path)
ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)Blue dots are reliable data, which are retained for subsequent data filtering.
![]() |
The intermediate data file is output as Script_data_filtering/output_data/step2_src_rec.dat
3. Detele events with records less than 5
Users can do it directly by
cd ./Script_data_filtering
python step3_limit_min_Ndata.py
cd ..The earthquake with less than 5 records are regard unstable in the subsequent inversion, especially containing relocation.
# load functions for data processing
import sys
sys.path.append('../../utils')
import functions_for_data as ffd
import os
# read .dat file
fname = "output_data/step2_src_rec.dat"
[ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)
# remove the earthquakes with less than 5 arrival times.
min_Nt_thd = 5
ev_info_obs = ffd.limit_min_Nt(min_Nt_thd, ev_info_obs)
# plot data distribution
ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)
# output data
out_path = "output_data"
os.makedirs(out_path,exist_ok=True)
# save data for TomoATT
out_fname = "%s/step3_src_rec.dat"%(out_path)
ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)Among the original 64,781 earthquakes (left), only 20,529 earthquakes are retained (right).
Blue dots are reliable data, which are retained for subsequent data filtering.
![]() | ![]() |
The intermediate data file is output as Script_data_filtering/output_data/step3_src_rec.dat
Generate common-source differential arrival times
Users can do it directly by
cd ./Script_data_filtering
python step4_generate_differential_traveltime.py
cd ..Common-source differential arrival time are generated based on the absolute traveltimes. These relative traveltime data can mitigate the influence of source uncertainty and emphasize the velocity anomaly on the receiver-side.
# load functions for data processing
import sys
sys.path.append('../../utils')
import functions_for_data as ffd
import os
fname = "output_data/step3_src_rec.dat"
[ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname)
dis_thd = 100 # distance between two stations should be less than 100 km
azi_thd = 30 # the angle bwteen two great circle paths from the common source to two separated receivers should be less than 30
ev_info = ffd.generate_cs_dif(ev_info_obs,st_info_obs,dis_thd,azi_thd)
# output as the data file for inversion
os.makedirs("../1_src_rec_files",exist_ok=True)
ffd.write_src_rec_file("../1_src_rec_files/src_rec_file.dat",ev_info_obs,st_info_obs)It is noted that, limitation is also applied when generating the differential arrival time, which enhances path overlapping to mitigate the influence of source uncertainty. Finally, 81,812 differential arrival times are genetated from the 152,254 absolute traveltimes.
The filtered data file is output as ../1_src_rec_files/src_rec_file.dat, which is then used for tomography.







