Skip to Content

Data filtering

After downloading the original data file for TomoATT Script_data_filtering/input_data/src_rec_Turkey.dat, we need apply strict data selection criteria to retain only reliably data for inversion. The filtering process consists of four steps:

1. Select data within the study region

Because the study region is oblique, we need to rotate the study region togather with sources and stations to a new local coordinate for computation (TomoATT inversion)

Users can do it directly by

cd ./Script_data_filtering python step1_select_data_in_region_with_rotation.py cd ..

The code is also presented below for convenience.

# load functions for data processing import sys sys.path.append('../../utils') import functions_for_data as ffd import os # read .dat file fname = "input_data/src_rec_Turkey.dat" [ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs) # we only retain the earthquakes and stations within the target study region: # center of the region: 37N,37E x-axis: [-1.3,1.3] (degree), y-axis [-2,3] degree, z-axis: 0.5-40 km. central_lat = 37.0 central_lon = 37.0 rotation_angle = 45.0 # step 1: rotate the source and receiver locations to a local coordinate system. The center of the region is (0,0) [ev_info_obs, st_info_obs] = ffd.rotate_src_rec(ev_info_obs,st_info_obs,central_lat,central_lon,rotation_angle) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)

The function rotate_src_rec anticlockwisely rotates all sources and stations by 45.0 degree and shift the center point (central_lat, central_lon) to (0, 0). See the figure below for the original data (left) and the data after rotation (right)

imgimg

After rotation, it is easy to delete data outside the study region. Here, we select the study region (after rotation) to be -2.0 - 3.0 degree in lat, -1.3 to +1.3 in lon, and 0.5 - 40 km in dep.

# step 2, we only retain the earthquakes and stations within the study region: x-axis: [-1.3,1.3] (degree), y-axis [-2,3] degree, z-axis: 0.5-40 km. lat1 = -2.0; lat2 = 3.0; lon1 = -1.3; lon2 = 1.3; dep1 = 0.5; dep2 = 40.0; # limit earthquake region ev_info_obs = ffd.limit_ev_region(ev_info_obs,lat1,lat2,lon1,lon2,dep1,dep2) # limit station region ev_info_obs = ffd.limit_st_region(ev_info_obs,st_info_obs,lat1,lat2,lon1,lon2) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs) # output data out_path = "output_data" os.makedirs(out_path,exist_ok=True) # save src_rec_file for TomoATT out_fname = "%s/step1_src_rec.dat"%(out_path) ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)

The intermediate data file is output as Script_data_filtering/output_data/step1_src_rec.dat

We can also rotate the data back for visualization or other purpose

# We can rotate the source and receiver locations back to the original coordinate system for plotting or other purposes. [ev_info_obs, st_info_obs] = ffd.rotate_src_rec_reverse(ev_info_obs, st_info_obs,central_lat,central_lon,rotation_angle) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs)

The filtered data before rotation (left) and after rotataion (right) are illustrated below

imgimg

2. Retain only reliable traveltimes

Users can do it directly by

cd ./Script_data_filtering python step2_retain_reliable_traveltime.py cd ..

The original traveltime data may contain some contaminated data. To mitigate the influence of unreliable on inversion results, we perform linear regression to retain only reliable data.

First, we remove evident outliers by setting a general upper and lower bound

# load functions for data processing import sys sys.path.append('../../utils') import functions_for_data as ffd import os # read .dat file fname = "output_data/step1_src_rec.dat" [ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname) # selection parameters # retain data satisfying: slope * dis + intercept + down < time < slope * dis + intercept + up slope = 0.16 intercept = 0 up = 10 down = -10 # range of distance from source to receiver dis_min = 0 dis_max = 500 ev_info_obs = ffd.fig_data_plot_remove_outliers(ev_info_obs,st_info_obs,slope,intercept,up,down,dis_min,dis_max)
img

Then, we retain data with epicenter distance less than 100 km and remove Pn phases, both of which can mitigate the influence of Moho topography uncertainty.

epi_dis1 = 100 # threshold of epicentral distance = 100 km epi_dis2 = 1000000 ev_info_obs = ffd.limit_epi_dis(ev_info_obs, st_info_obs, epi_dis1, epi_dis2) # retain given phases (P, Pg, Pb) phase_list = ["P","Pg","Pb"] ev_info_obs = ffd.limit_data_phase(ev_info_obs,phase_list)

Finally, we do linear regression of all data and retain data with residual less tha 3 times of the standard estimated error

# Do linear regression of all data and retain data with residual < 3*SEE [dis_obs,time_obs] = ffd.data_dis_time(ev_info_obs,st_info_obs) (slope,intercept,SEE) = ffd.linear_regression(dis_obs,time_obs) up = 3*SEE down = -3*SEE # range of distance from source to receiver dis_min = 0 dis_max = 110 ev_info_obs = ffd.fig_data_plot_remove_outliers(ev_info_obs,st_info_obs,slope,intercept,up,down,dis_min,dis_max) [dis_obs,time_obs] = ffd.data_dis_time(ev_info_obs,st_info_obs) (slope_2,intercept_2,SEE_2) = ffd.linear_regression(dis_obs,time_obs) print("The (slope,intercept,SEE) of original data is (%6.3f,%6.3f,%6.3f)"%(slope,intercept,SEE)) print("The (slope,intercept,SEE) of filtered data is (%6.3f,%6.3f,%6.3f)"%(slope_2,intercept_2,SEE_2)) # output data out_path = "output_data" os.makedirs(out_path,exist_ok=True) # save data for TomoATT out_fname = "%s/step2_src_rec.dat"%(out_path) ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)

Blue dots are reliable data, which are retained for subsequent data filtering.

img

The intermediate data file is output as Script_data_filtering/output_data/step2_src_rec.dat

3. Detele events with records less than 5

Users can do it directly by

cd ./Script_data_filtering python step3_limit_min_Ndata.py cd ..

The earthquake with less than 5 records are regard unstable in the subsequent inversion, especially containing relocation.

# load functions for data processing import sys sys.path.append('../../utils') import functions_for_data as ffd import os # read .dat file fname = "output_data/step2_src_rec.dat" [ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs) # remove the earthquakes with less than 5 arrival times. min_Nt_thd = 5 ev_info_obs = ffd.limit_min_Nt(min_Nt_thd, ev_info_obs) # plot data distribution ffd.fig_ev_st_distribution_dep(ev_info_obs, st_info_obs) # output data out_path = "output_data" os.makedirs(out_path,exist_ok=True) # save data for TomoATT out_fname = "%s/step3_src_rec.dat"%(out_path) ffd.write_src_rec_file(out_fname,ev_info_obs,st_info_obs)

Among the original 64,781 earthquakes (left), only 20,529 earthquakes are retained (right).

Blue dots are reliable data, which are retained for subsequent data filtering.

imgimg

The intermediate data file is output as Script_data_filtering/output_data/step3_src_rec.dat

Generate common-source differential arrival times

Users can do it directly by

cd ./Script_data_filtering python step4_generate_differential_traveltime.py cd ..

Common-source differential arrival time are generated based on the absolute traveltimes. These relative traveltime data can mitigate the influence of source uncertainty and emphasize the velocity anomaly on the receiver-side.

# load functions for data processing import sys sys.path.append('../../utils') import functions_for_data as ffd import os fname = "output_data/step3_src_rec.dat" [ev_info_obs, st_info_obs] = ffd.read_src_rec_file(fname) dis_thd = 100 # distance between two stations should be less than 100 km azi_thd = 30 # the angle bwteen two great circle paths from the common source to two separated receivers should be less than 30 ev_info = ffd.generate_cs_dif(ev_info_obs,st_info_obs,dis_thd,azi_thd) # output as the data file for inversion os.makedirs("../1_src_rec_files",exist_ok=True) ffd.write_src_rec_file("../1_src_rec_files/src_rec_file.dat",ev_info_obs,st_info_obs)

It is noted that, limitation is also applied when generating the differential arrival time, which enhances path overlapping to mitigate the influence of source uncertainty. Finally, 81,812 differential arrival times are genetated from the 152,254 absolute traveltimes.

The filtered data file is output as ../1_src_rec_files/src_rec_file.dat, which is then used for tomography.

Last updated on