This page summarizes some of the more common issues and questions that arise in using mloc, especially those encountered while developing a calibrated cluster.
- Creating a cluster
- What Kind of Relocation?
- Do I Need to Repick the Arrival Times to Obtain Better Locations?
- How Do I know When I’m Done?
Creating a Cluster
In selecting events for a cluster, there are several factors to be considered besides the basic choice of a source region, including the geographic extent, range of dates, depth range and the distance ranges represented by the data in different events.
A cluster is a collection of seismic events within a limited geographic area. This requirement comes from the notion, underlying all relative event relocation schemes, that if the raypaths from different events to each observing station are very similar, differencing the arrival times can remove most of the “noise” in the signals due to unknown Earth structure, leaving the relative locations as the main source of signal. If your cluster covers too large a region, the raypaths from different events to a station will have too little in common. There is no easy answer to the obvious question: “How big is too big?”. In addition to the nature and degree of crustal heterogeneity the answer will depend on the kind of data you have: local-distance data will be more sensitive to that heterogeneity than teleseismic data. Heterogeneity may be azimuthally-dependent.
Experience has shown that 50-100 km is generally a pretty safe target for geographic extent. As clusters become larger than that there is increasing concern about biased relative locations, especially for events near the edges, but there may be little choice in some regions of sparse seismicity. Clusters less than 50 km across rarely evidence any of bias related to crustal heterogeneity.
In principle, the date of events in a cluster, or the range of dates in a cluster, is of no relevance to the multiple event relocation problem, but in fact it is quite significant, due to changes in the constellation of observing seismic stations through time. All events must have a minimal level of connectivity, through observations at common stations, or the inversion will become unstable. There are a few seismic stations that have remained operational since the early 20th century and these can sometimes suffice to provide the connectivity needed to include an earthquake from, say, the 1930s in a calibrated cluster with recent events. This usually works only with rather large events, obviously.
The same issue can arise in more recent time periods, when a previously poorly monitored region gains a dense monitoring network. This occurred in Iran between about 1995 and 2005. A search of the ISC Bulletin may yield what appears to be a nice cluster with some older larger events of interest, mainly recorded at teleseismic and far-regional distances, and many smaller, more recent events recorded by the recently-installed regional network, with abundant readings at close distances and good azimuthal coverage for direct calibration. It can turn out, however, that the older, larger events and the more recent, smaller events have very few stations in common, leading to a failure of mloc to converge. There is little to be done with such a cluster until the modern regional network captures a larger event that is recorded at far-regional and teleseismic distances. In cases like this, multiple event relocation can be less capable than single event location at analyzing seismicity over an extended time period. However, when it is possible to include older events in a “modern” calibrated cluster there is considerable value added.
mloc can relocate clusters containing events at any focal depth supported by the ak135 global travel-time model. For the same reason that we wish to keep clusters limited in geographical extent it is wise to limit the depth extent of a cluster. This is usually not an issue; outside of subduction zones the depth range of earthquakes in most continental and oceanic source regions is naturally limited.
Even in subduction zone clusters the range of depths for a cluster that is not too widespread is usually within reason. Even so it may be wise to confine the cluster to a more limited depth range. The reason has to do with mloc’s crustal models, and whether events are above or below the model’s Moho boundary. The simple 1-D crustal model supported by mloc already has trouble accounting for local and near-regional arrival time observations in a subduction zone setting with its strong dipping interfaces. The interplay between focal depth and Moho depth has strong effects on the pattern of arrivals and their phase identifications, which are difficult enough to unravel if all the events are fairly shallow (crustal level). If the cluster includes events below the Moho as well, it can become quite challenging to sort out the focal depth vs. crustal structure problem.
Distance Range of Observations
This is another expression of the connectivity problem in multiple event relocation. “Range of observation” can be taken as a proxy for magnitude, obviously. It is possible to have success with clusters containing events with very different ranges of observation, but it will usually require having good representation across the range of magnitudes. There is no objective way to evaluate connectivity problems in advance, but intuition is gained rapidly with experience. If mloc is having trouble converging it is nearly always due to weak connectivity.
How Many Events?
mloc can be run with a single event, but some aspects of its operation, such as estimating empirical reading errors, depend on having more than one event. When the number of events is small the statistical power in the dataset, i.e., repeated observations, is weak and the uncertainties in the results will be larger than they would be with a more populous cluster. Experience has shown that this issue stabilizes fairly well when the cluster reaches ~30 events and by 50 events the statistical power is about as good as it will get.
The computational load of a run with mloc depends mostly on the number of events in the cluster, i.e., the number of free parameters (usually 3 or 4 per event). The increase in runtime with increasing number of events is greater than linear. I generally try to keep a cluster below ~100 events unless there are strong reasons to include more. mloc is configured for a maximum of 200 events. With readily-available desktop and laptop computers a run for a cluster of this size takes many minutes, enough to slow down the overall analysis (which requires many runs) considerably. If you are very patient or have an especially powerful computer on which to run and you really need to analyze more than 200 events in a cluster, it is easy to set the limit higher by editing the mloc.inc file and changing the value of the parameter nevmax. You will need to recompile afterwards, of course.
What Kind of Relocation?
There are four types of relocation that can be done with mloc:
- Direct calibration
- Indirect calibration
- Direct calibration followed by indirect calibration
If there is not enough near-source data to employ one of the calibrated relocation methodologies, the user must accept that the absolute locations and origin times of the relocated events are biased to an unknown degree by unknown Earth structure. The most robust estimate in this case is to use only teleseismic P arrivals to estimate the hypocentroid; the command file would include:
phyp on hlim 30. 90.
Although the degree of bias is unknown, it is possible to set some rough limits on it. In most cases the hypocentroid estimated this way is probably not more than about 10 km in error. The origin times are unlikely to be more than about 3 seconds off, and are much more likely to be late than early.
Although it is perfectly capable for performing traditional uncalibrated relocations of clusters of earthquakes mloc has been specifically developed to conduct calibrated relocations. My personal preference is to use direct calibration if possible. It is based strictly on seismological observations. Direct calibration calibrates origin time in addition to epicenter and focal depth, which is extremely important for studies of Earth structure, whereas most other sources of information on “location” rarely constrain origin time. When the data for direct calibration are inadequate to provide a robust result, however, indirect calibration can be very helpful. When direct calibration is successful and other estimates of source locations are available, such as from InSAR or observations of surface faulting, there is much value in having independent estimates of hypocentral parameters to compare.
If a cluster includes one or more sources for which ground truth location information is available, indirect calibration may be the preferred method, but even in this case I would prefer to do direct calibration first (if possible) and then check it with indirect calibration. There are cases in which reported “ground truth” locations are clearly in error (e.g., Mackey and Bergman, 2014).
It is not uncommon, when developing a calibrated cluster, to find it useful to perform a few uncalibrated relocations in the early going. This simplifies the analysis somewhat and permits the user to focus on the basic composition of the cluster and sort out basic issues, such as missing station codes and code conflicts, the geographic extent and depth extent of the cluster and problems with connectivity.
Do I Need to Repick the Arrival Times to Obtain Better Locations?
It is usually impossible to know how the arrival times in a dataset downloaded from the ISC or other major data center have been picked. Most have probably been reviewed to some extent by a human but some may be automated picks. The expertise of the human who may have made the pick or reviewed an automated pick is highly variable. mloc is designed so that one does not need to worry very much about these issues, by employing empirical reading errors estimated from the arrival time data itself, and using those estimates in a statistical analysis (cleaning) to identify outlier readings and to weight the data in the inversion.
The improvements in location that are likely to be obtained from repicking the arrival times are minor (with one important exception, below), and in fact they can be negative. At one point in mloc’s development we arranged for one of the most experienced seismic analysts in the world to provide her own readings of arrivals for which we also had the original ISC bulletin data. It turned out that her picks were quite often flagged as outliers because most of the arrival times in our dataset were picked by much less experienced and dedicated analysts. For the most part, mloc values consistency rather than absolute correctness.
The exception alluded to above is the case of direct calibration, in which the arrival time data at short epicentral distances (usually less than ~100 km) is used to estimate the hypocentroid. These picks need to be as accurate as possible and careful repicking, even during the mloc analysis, can make a difference, especially if the available dataset for estimating the hypocentroid is on the minimal side. If there is a lot of data, however, repicking is unlikely to make any noticeable difference to the calibration.
How Do I know When I’m Done?
First of all, mloc should be converging within one or two iterations if you are starting the relocation from the locations of the previous run (command rhdf). There should be few if any cases of missing station codes. The cleaning process should be converged, meaning that running with lres = 3 produces a very small or empty ~.lres file and the ~.xdat file should also have no more than a handful of entries.
Review the various output files and plots related to focal depth and convince yourself that there is no more that can be done in that regard. Some events simply do not have any arrival time information that helps to constrain depth, and they will need to be set at some reasonable default depth. I usually take the default depth for a cluster as the median of constrained depths, listed in the ~.depth_phases file.
All of the summary plots are potentially valuable in detecting anomalous results that may indicate the need for more work. On the baseplot, look for events that are still moving significantly from the last run (green vector). Also look for events which have ended up far from their initial location (black vector), say, more than 20 or 30 km away. This could be correct, but it could also be a case where the cleaning process has flagged a number of readings in error. Check the ~.phase_data file to see if there are an unusually large number of readings in the “bad data” section. If so, consider if adding some of them back into the problem might lead to a different solution. Some events do have an unusual number of outliers.
In a direct calibration analysis the near-source travel-time plot is especially important to review. Check the baseline offset of the Pg and Sg phases; they should both be less than one or two tenths of a second in absolute value. There should be little sign of a slope to the residuals of either phase and there should not be any large outliers. For Pg it is rare to see a legitimate residual greater than about 1 s; for Sg the likely limit is ~2 seconds. If the residuals at very short distances (less than ~0.1°) show a noticeable curvature there may be issues with focal depth. If there are many Pn and Sn readings, consider reducing the distance range for the hypocentroid (command hlim).
In an indirect calibration, check the baseplot with respect to the residual calibration shift vectors (blue vectors). Anomalies (failure to cover) can be further investigated in the ~.cal file. Check the statistical tests of “radius of doubt”. If it is greater than a couple hundred meters there is probably something amiss. Consider dropping (as calibration events) events that fail the coverage criterion badly, since it is likely that either the original “ground truth” location is bogus or the relative location of the event was mishandled in the mloc analysis. Alternatively, consider if the reported uncertainties of the calibration events have been under-estimated. There is a command cvff that can be used to inflate the uncertainties of the cluster vectors such that discrepancies with the adopted calibration locations are reduced in a statistical sense, but this is completely ad hoc.