Compute Cross-Correlations

This code is responsible for the computation of the cross-correlation functions.

This script will group jobs marked “T”odo in the database by day and process them using the following scheme. As soon as one day is selected, the corresponding jobs are marked “I”n Progress in the database. This allows running several instances of this script in parallel.

As of MSNoise 1.6, the compute step has been completely rewritten:

  • The compute_cc step has been completely rewritten to make use of 2D arrays holding the data, processing them “in place” for the different steps (FFT, whitening, etc). This results in much more efficient computation. The process slides on time windows and computes the correlations using indexes in a 2D array, therefore avoiding an exponential number of identical operations on data windows.

  • This new code is the default compute_cc, and it doesn’t allow computing rotated components. For users needing R or T components, there are two options: either use the old code, now named compute_cc_rot, or compute the full (6 components actually are enough) tensor using the new code, and rotate the components afterwards. From initial tests, this latter solution is a lot faster than the first, thanks to the new processing in 2D.

  • It is now possible to do the Cross-Correlation (classic “CC”), the Auto- Correlation (“AC”) or the Cross-Components within the same station (“SC”). To achieve this, we removed the ZZ, ZT, etc parameters from the configuration and replaced it with components_to_compute which takes a list: e.g. ZZ,ZE,ZN,EZ,EE,EN,NZ,NE,NN for the full non-rotated tensor between stations. Adding components to the new components_to_compute_single_station will allow computing the cross-components (SC) or auto-correlation (AC) of each station.

  • The cross-correlation is done on sliding windows on the available data. For each window, if one trace contains a gap, it is eliminated from the computation. This corrects previous errors linked with gaps synchronised in time that lead to perfect sinc autocorrelation functions. The windows should have a duration of at least “2 times the `maxlag`+1” to be computable.

Configuration Parameters

The following parameters (modifiable via `msnoise admin`) are used for this step:

  • components_to_compute: List (comma separated) of components to compute between two different stations [ZZ] (default=ZZ)

  • components_to_compute_single_station: List (comma separated) of components within a single station. ZZ would be the autocorrelation of Z component, while ZE or ZN are the cross-components. Defaults to [], no single-station computations are done. (default=) | new in 1.6

  • cc_sampling_rate: Sampling Rate for the CrossCorrelation [20.0] (default=20.0)

  • analysis_duration: Duration of the Analysis (total in seconds : 3600, [86400]) (default=86400)

  • overlap: Amount of overlap between data windows [0:1[ [0.] (default=0.0)

  • maxlag: Maximum lag (in seconds) [120.0] (default=120.)

  • corr_duration: Data windows to correlate (in seconds) [1800.] (default=1800.)

  • windsorizing: Windsorizing at N time RMS , 0 disables windsorizing, -1 enables 1-bit normalization [3] (default=3)

  • resampling_method: Resampling method Decimate/[Lanczos] (default=Lanczos)

  • remove_response: Remove instrument response Y/[N] (default=N)

  • response_format: Remove instrument file format [dataless]/inventory/paz/resp (default=dataless)

  • response_path: Instrument correction file(s) location (path relative to db.ini), defaults to ‘./inventory’, i.e. a subfolder in the current project folder. | All files in that folder will be parsed. (default=inventory)

  • response_prefilt: Remove instrument correction pre-filter (0.005, 0.006, 30.0, 35.0) (default=(0.005, 0.006, 30.0, 35.0))

  • preprocess_lowpass: Preprocessing Low-pass value in Hz [8.0] (default=8.0)

  • preprocess_highpass: Preprocessing High-pass value in Hz [0.01] (default=0.01)

  • preprocess_max_gap: Preprocessing maximum gap length that will be filled by interpolation [10.0] seconds (default=10.0) | new in 1.6

  • preprocess_taper_length: Duration of the taper applied at the beginning and end of trace during the preprocessing, to allow highpassfiltering (default=20.0) | new in 1.6

  • keep_all: Keep all cross-corr (length: corr_duration) [Y]/N (default=N)

  • keep_days: Keep all daily cross-corr [Y]/N (default=Y)

  • stack_method: Stack Method: Linear Mean or Phase Weighted Stack: [linear]/pws (default=linear)

  • pws_timegate: If stack_method=’pws’, width of the smoothing in seconds : 10.0 (default=10.0)

  • pws_power: If stack_method=’pws’, Power of the Weighting: 2.0 (default=2.0)

  • whitening: Whiten Traces before cross-correlation: [A]ll (except for autocorr), [N]one, or only if [C]omponents are different: [A]/N/C (default=A) | new in 1.5

  • whitening_type: Type of spectral whitening function to use: [B]rutal (amplitude to 1.0), divide spectrum by its [PSD]: [B]/PSD. WARNING: only works for compute_cc, not compute_cc_rot, where it will always be [B] (default=B) | new in 1.6

  • hpc: Is MSNoise going to run on an HPC? Y/[N] (default=N) | new in 1.6

Waveform Pre-processing

Pairs are first split and a station list is created. The database is then queried to get file paths. For each station, all files potentially containing data for the day are opened. The traces are then merged and split, to obtain the most continuous chunks possible. The different chunks are then demeaned, tapered and merged again to a 1-day long trace. If a chunk is not aligned on the sampling grid (that is, start at a integer times the sample spacing in s) , the chunk is phase-shifted in the frequency domain. This requires tapering and fft/ifft. If the gap between two chunks is small, compared to preprocess_max_gap, the gap is filled with interpolated values. Larger gaps will not be filled with interpolated values.

Each 1-day long trace is then high-passed (at preprocess_highpass Hz), then if needed, low-passed (at preprocess_lowpass Hz) and decimated/downsampled. Decimation/Downsampling are configurable (resampling_method) and users are advised testing Decimate. One advantage of Downsampling over Decimation is that it is able to downsample the data by any factor, not only integer factors. Downsampling is achieved with the ObsPy Lanczos resampler which we tested against the old scikits.samplerate.

If configured, each 1-day long trace is corrected for its instrument response. Currently, only dataless seed and inventory XML are supported.

../_images/preprocessing.png

As from MSNoise 1.5, the preprocessing routine is separated from the compute_cc and can be used by plugins with their own parameters. The routine returns a Stream object containing all the traces for all the stations/components.

Computing the Cross-Correlations

Processing using msnoise compute_cc

Todo

We still need to describe the workflow in plain text, but the following graph should help you understand how the code is structured

../_images/compute_cc.png ../_images/MyCorr2.png

Processing using msnoise compute_cc_rot

Once all traces are preprocessed, station pairs are processed sequentially. If a component different from ZZ is to be computed, the traces are first rotated. This supposes the user has provided the station coordinates in the station table. The rotation is computed for Radial and Transverse components.

Then, for each corr_duration window in the signal, and for each filter configured in the database, the traces are clipped to windsorizing times the RMS (or 1-bit converted) and then whitened in the frequency domain (see whiten) between the frequency bounds. The whitening procedure can be skipped by setting the whitening configuration to None. The two other whitening modes are “[A]ll except for auto-correlation” or “Only if [C]omponents are different”. This allows skipping the whitening when, for example, computing ZZ components for very close by stations (much closer than the wavelength sampled), leading to spatial autocorrelation issues.

When both traces are ready, the cross-correlation function is computed (see mycorr). The function returned contains data for time lags corresponding to maxlag in the acausal (negative lags) and causal (positive lags) parts.

Saving Results (stacking the daily correlations)

If configured (setting keep_all to ‘Y’), each corr_duration CCF is saved to the hard disk in the output_folder. By default, the keep_days setting is set to True and so “N = 1 day / corr_duration” CCF are stacked to produce a daily cross-correlation function, which is saved to the hard disk in the STACKS/001_DAYS folder.

Note

Currently, the keep-all data (every CCF) are not used by next steps.

If stack_method is ‘linear’, then a simple mean CCF of all windows is saved as the daily CCF. On the other hand, if stack_method is ‘pws’, then all the Phase Weighted Stack (PWS) is computed and saved as the daily CCF. The PWS is done in two steps: first the mean coherence between the instataneous phases of all windows is calculated, and eventually serves a weighting factor on the mean. The smoothness of this weighting array is defined using the pws_timegate parameter in the configuration. The weighting array is the power of the mean coherence array. If pws_power is equal to 0, a linear stack is done (then it’s faster to do set stack_method = ‘linear’). Usual value is 2.

Warning

PWS is largely untested, not cross-validated. It looks good, but that doesn’t mean a lot, does it? Use with Caution! And if you cross-validate it, please let us know!!

Schimmel, M. and Paulssen H., “Noise reduction and detection of weak, coherent signals through phase-weighted stacks”. Geophysical Journal International 130, 2 (1997): 497-505.

Once done, each job is marked “D”one in the database.

Usage

To run this script:

$ msnoise compute_cc

This step also supports parallel processing/threading:

$ msnoise -t 4 compute_cc

will start 4 instances of the code (after 1 second delay to avoid database conflicts). This works both with SQLite and MySQL but be aware problems could occur with SQLite.

New in version 1.4: The Instrument Response removal & The Phase Weighted Stack & Parallel Processing

New in version 1.5: The Obspy Lanczos resampling method, gives similar results as the scikits.samplerate package, thus removing the requirement for it. This method is defined by default.

New in version 1.5: The preprocessing routine is separated from the compute_cc and can be called by external plugins.

New in version 1.6: The compute_cc has been completely rewritten to be much faster, taking advantage from 2D FFT computation and in-place array modifications. The standard compute_cc does process CC, AC and SC in the same code. Only if users need to compute R and/or T components, they will have to use the slower previous code, now called compute_cc_rot.