MLOC Utility Codes

Several types of utility programs are essential to the efficient use of mloc. These include several programs for editing event files efficiently as part of the so-called cleaning process:

rstat: an interactive program to explore cluster residuals (normalized, demeaned travel-time residuals), an essential aid to manual editing (flagging) of outlier readings.
lres: batch processing to flag a set of readings with large cluster residuals across multiple event files listed in the ~.lres output file of a previous run of mloc. Flagged readings will not be used in future runs.
xdat: batch process to flag arrival time readings with grossly large residuals, listed in the ~.xdat file from a previous run of mloc.

The source codes for these utilities are contained in the mloc distribution in the directory /mloc_distribution/mloc_utilities/. Executables for macOS are included as well, or they can be easily re-compiled using a simple command, e.g.:

gfortran lres.f90 -o lres

See MNF Utility Codes for some other important utility programs for creating event data files in the MNF format.

Editing Event Files

A proper relocation analysis with mloc is never completed in one run of the program, because arrival time data sets nearly always contain some (usually many) outlier readings that must be identified and flagged and because the empirical reading errors for each station-phase pair are never known a priori. The task of satisfying these requirements is known as cleaning and it must be done iteratively, i.e., one deals with the worst outliers first, re-runs to obtain improved estimates of empirical reading errors, and repeats until the relocation satisfies the various criteria for completion.

Therefore, the usual pattern of a relocation analysis is to make a run of mloc, followed by the use of one or more of the utility programs discussed here, and repeat. These utilities are designed to be run in the cluster directory, where the event data files are stored, so it is convenient to keep copies of the executables in the /utilities subdirectory of the working directory for easy access. The executables are then copied from there into the cluster directory when the analysis begins and deleted when it is done.

rstat

The utility program rstat provides access to detailed information and statistics about the arrivals from a specified station-phase which are very helpful in the cleaning process. It is the tool with which to make the most careful investigation of potential outliers and new users of mloc are encouraged to make heavy use of it in order to gain intuition about the nature of outliers in arrival time datasets. The other two tools discussed here, lres and xdat are more efficient but their proper use depends on the intuition gained through use of rstat. In most cluster relocation analyses I make use of all three. Some clusters go very easily and require very little use of rstat, but some require what amounts to hand-to-hand combat and in those cases rstat is your best weapon.

To use rstat it must be located in the cluster directory, with the output files from the latest run of mloc and all the event data files. I keep a copy of the executable rstat in mloc_distribution/mloc_working/utilities/ and copy it into the cluster directory when I start work. The first steps in using rstat are shown in this example, using a cluster directory called salmas2:

$ cd salmas2
$ rstat

Release date June 29, 2018

Enter file name for phase_data: salmas2.2.phase_data
Enter number of iterations: 2
Is there differential time data? y or n: n
Case sensitive station names? y or n: n
Use deployment codes? y or n: n

Enter station name (q to quit):

The data that rstat will read and process are all carried in a ~.phase_data output file. rstat next requires as input the number of iterations that were performed so it knows which column of residuals to read. The number of iterations is given in the terminal window near the end of each run, or it can be read easily from the ~.phase_data file itself. The next three questions deal with unusual cicumstances to which the answer is usually n; these will be discussed below.

The next step is to specify a station code and phase name; this is done in two steps. Then rstat reads through the entire ~.phase_data file, extracts every instance of that station-phase combination, and carries out a statistical analysis of the residuals:

Enter station name (q to quit): tab
Enter phase name (* for all phases): Pg

                             rderr  delta    dts   wgt    eci
 21 TAB              Pg       0.28   0.84   1.19  1.00   1.00 ISC          Pg           5  salmas2/19810524.2112.21.mnf
 27 TAB              Pg       0.28   0.96   0.45  1.00  -1.59 ISC          Pg           5  salmas2/19840629.1955.16.mnf
 31 TAB              Pg       0.28   0.82   0.78  1.00  -0.17 ISC          Pg           6  salmas2/19891203.0739.09.mnf
 32 TAB              Pg       0.28   1.04   1.00  1.00   0.35 ISC          Pg           5  salmas2/19920305.0330.15.mnf
 33 TAB              Pg       0.28   1.11   0.87  1.00  -0.06 ISC          Pg           5  salmas2/19930330.2225.19.mnf
 38 TAB              Pg       0.28   0.95   1.18  1.00   1.29 ISC          Pg          22  salmas2/19981123.1111.39.mnf
 39 TAB              Pg       0.28   1.57   0.59  1.00  -0.82 ISC          Pn          26  salmas2/19990219.1800.10.mnf
 22 TAB            x Pg       0.28   0.82   2.45  1.00        ISC          Pg           5  salmas2/19810524.2207.05.mnf
 24 TAB            x Pg       0.28   1.56  -2.50  1.00        ISC          Pn           8  salmas2/19830803.0306.01.mnf
 26 TAB            x Pg       0.28   0.82   3.50  1.00        ISC          Pg           5  salmas2/19840325.0244.57.mnf
 37 TAB            x Pg       0.28   0.91   3.09  1.00        ISC          Pg          22  salmas2/19981118.1137.19.mnf
 40 TAB            x Pg       0.28   1.41  -1.92  1.00        ISC          Pn          51  salmas2/20000226.0818.38.mnf
Mean =  0.866
Sn =  0.400
On    7 readings.

Enter station name (q to quit):

Because we did not request case-sensitive stations codes, most station codes can be entered in lower case. There is an exception, however: station codes that include numbers must be entered in the correct case. The phase name must always be entered in the correct case. The output columns are:

Event number
Station code
Deployment code (if requested)
Phase reading flag (many are skipped, “x” and “s” are displayed)
Phase name for the current run
Reading error used for this station-phase in the current run
Epicentral distance
Time residual
Weight (see the wind command)
Cluster residual (eci), demeaned, normalized residual
Reading author
Phase name read from the event file
Line number of the reading in the event file
Name of the event file

The listing of all instances of the requested station-phase is followed by output of the mean of the unflagged residuals, the spread of the unflagged residuals, calculated as the robust statistic Sn (no relation to the seismic phase) formulated by Croux and Rousseeuw, (1992) and the number of instances used in calculating the statistics.

rstat does not do anything to any of the events files. It only provides information with which the user may decide to flag certain readings, unflag them, or modify the phase name read from the event file. In most cases mloc is run with phase re-identification activated (command phid) so the final phase name may be changed by the algorithm, as it has been for the Pn reading at TAB for event 39 in the above listing. Another option is to use the special flag ! in the prevent phase re-identification field of the event file, if the user judges that the phase re-identification algorithm is making a mistake.

The field of primary interest in the output of rstat is the cluster residual (variable eci in the code). It displays the distance of a residual (dts) from the mean, normalized by the current estimate of reading error (rderr). In most cases the reading error will be an estimate of Sn from a previous run, read by mloc from the ~.rderr file by the command rfil.

Early in a relocation analysis there will be many instances of large values of the cluster residual, because there are nearly always many outlier readings in typical arrival time datasets. How is “large” defined? We assume that the “good” readings of a certain phase observed at a certain station have a roughly Gaussian distribution, with unknown standard deviation and perhaps with a baseline offset from the predicted arrival time. Outlier readings are those that lie so far from the the mean that they are unlikely to be members of the population of “good readings”. If our measures of the mean and spread of the population are reasonably accurate, we can judge that nearly 100% of the “good” readings will have values of eci that are less than 3.0 (see the 68-95-99.7 rule). One can push the target limit below 3.0 to some extent, but the it does not gain much in the accuracy of the results and it threatens to invalidate the statistical assumptions that go into estimates of uncertainties of the hypocentral parameters. It is not recommended.

You cannot simply run rstat once, flag all readings with eci > 3 and declare victory over outliers. The cleaning process must be done gradually, beginning with the largest outliers and gradually attaining (through many runs) a condition where very few if any readings exceed that limit. There are several reasons why an incremental approach is needed. One is that the locations of the events will change after readings are flagged and the distribution of residuals will change as a result. It is useful to remember that in mloc everything effects everything else. Another is that the Sn estimator, while powerful, is not very helpful with distributions containing many outliers. The user’s intuition will come into play as well, especially as experience is gained with what these distributions look like in practice. A third reason is that the estimate of Sn will become smaller as outliers are removed from the problem and thus readings that appeared to be “within limits” before will appear as outliers when the relocation is run again with a smaller empirical reading error for the station-phase of interest. It can sometimes feel like one is chasing one’s own tail but in fact the process does converge.

For new users especially it is better to err on the side of keeping outliers in the problem than risk flagging readings which should actually have been kept. The reason for this is that mloc will naturally down-weight readings that belong to a distribution containing severe outliers. The empirical reading error will be large in such cases and the data are weighted inversely to empirical reading error.

In use, one should be able to open all the event files, along with one or more of the output files such as ~.lres, ~.phase_data and ~.dcal_phase_data, while running rstat in a terminal window, choosing station-phases to investigate on the basic of clues such as large absolute residuals, large values of eci and large values of empirical reading error. Based on the output from rstat the user goes to the appropriate event file, jumps to the line number of the reading of interest and flags the reading or edits the phase name. This why the choice of a text editor is very important in mloc; it will be heavily exercised by the manual cleaning process employing rstat.

It is not uncommon to find readings that have been flagged incorrectly during the cleaning process, so removing flags is also a regular task. Here is an example, from the same salmas2.2 run:

Enter station name (q to quit): cldr
Enter phase name (* for all phases): Sg

                             rderr  delta    dts   wgt    eci
 61 CLDR             Sg       0.70   0.22   0.01  1.00  -0.71 ISC          Sg           7  salmas2/20090705.2348.25.mnf
 62 CLDR             Sg       0.70   0.51   0.68  1.00   0.07 ISC          Sg           7  salmas2/20101106.0105.16.mnf
 63 CLDR             Sg       0.70   0.49   0.83  1.00   0.30 ISC          Sg           7  salmas2/20101206.0516.09.mnf
 65 CLDR             Sg       0.70   0.70  -0.63  1.00  -1.82 ISC          Sg          16  salmas2/20110710.0423.02.mnf
 74 CLDR             Sg       0.70   0.41   0.29  1.00  -0.57 ISC          S           19  salmas2/20111029.2224.24.mnf
 74 CLDR             Sg       0.70   0.41   0.71  1.00   0.02 ISC          Sg          17  salmas2/20111029.2224.24.mnf
 75 CLDR             Sg       0.70   0.38   1.71  1.00   1.50 ISC          Sg          21  salmas2/20111106.0243.14.mnf
 75 CLDR             Sg       0.70   0.38   1.51  1.00   1.21 ISC          S           23  salmas2/20111106.0243.14.mnf
 55 CLDR           x Sg       0.70   1.02   1.55  1.00        ISC          Sg          27  salmas2/20060729.0151.11.mnf
 57 CLDR           x Sg       0.70   1.13   1.34  1.00        ISC          Sg          36  salmas2/20061202.0639.37.mnf
 64 CLDR           x Sg       0.70   0.56   0.54  1.00        ISC          Sg           6  salmas2/20110314.1857.11.mnf
Mean =  0.639
Sn =  0.839
On    8 readings.

Enter station name (q to quit):

Both readings for event 75 should be flagged as outliers, but the reading for event 64 should be unflagged.

The effects of flagging (or unflagging) readings with rstat are not all felt immediately in the next run of mloc. The presence or absence of the edited readings is felt in the next run, but the empirical reading errors that will be read from the ~.rderr file of the current run will not yet reflect those edits. The ~.rderr file from the next run will reflect the edits you just made and so they will be available for the run after next.

There are many subtleties about the use of rstat which can only be learned by experience and thoughtfulness about the nature of arrival time data and the various factors that can influence the values that show up in a typical seismic bulletin. Correct and effective use of mloc cannot be accomplished without having mastered those subtleties to some extent.

lres

If you understand the use of rstat the use of lres is trivial. To use it you must first issue the command lres when running mloc. The command takes an argument which is the value of cluster residual (eci) above which a reading will be written to the ~.lres output file. The utility program lres simply reads the content of that file (the name of the ~.lres file is the only input) and flags the corresponding readings in the event files.

During the course of a relocation analysis the threshold value of eci in the lres command would be fairly high (say, 6.0) in early runs and would gradually be reduced until an eci threshold of 3.0 results in a ~.lres file with few if any contents. In a difficult analysis one might run rstat rather than lres and do the editing by hand, especially in the early going. One might keep the threshold value of eci at the same level for a number of runs, while dealing with outliers or other issues such as setting focal depths or refining the crustal velocity model.

As with rstat the effects of editing readings with lres are not fully felt until the second successive run.

xdat

The utility program xdat operates very similarly to lres, flagging readings in event files according to data in an output file from a previous run of mloc. The input file ~.xdat is based on the windowing algorithm (see command wind) that identifies gross outliers and drops them from the relocation. These readings are listed in the ~.phase_data file in the “BAD DATA” section for each event, and they are annotated “PRES” under the “Why Bad” column.

Since these readings are automatically dropped by mloc it is not essential to flag them with xdat but it is a good practice to do so a few times during an analysis so that you end up with a “clean” dataset. Sometimes these readings can fall back into the relocation and create problems. The ~.xdat file is always created by mloc but it will often be empty after xdat has been run a few times.