Skip to content

Instantly share code, notes, and snippets.

@Smarker
Last active January 14, 2019 19:36
Show Gist options
  • Save Smarker/429b62e2b05698329ec6f9997c42630b to your computer and use it in GitHub Desktop.
Save Smarker/429b62e2b05698329ec6f9997c42630b to your computer and use it in GitHub Desktop.
#kaggle

VSB Power Line Fault Detection

Research Papers

Selecting Features

Feature Importance
Number of peaks 1.7676
Mean width of peaks 1.8311
Mean height of peaks 0.7818
Max. width of peaks 1.9384
Max. height of peaks 0.9907
Min. width of peaks 1.8822
Min. height of peaks 0.9474

Types of Noise

There are several sources of background noise:

  1. Discrete spectral interference DSI (radio emissions)
    • sometimes, more than 100 radio stations can be identified in the raw signal
    • these sources of DSI can be easily recognizable with FFT based on their modulation
  2. Repetitive pulses interference (power electronics)
    • represented by a corona discharge - hissing noise when standing below a high voltage transmission line is due to the corona discharge
  3. Random pulses interference RPI (lightning, switching operations, corona)
  4. Ambient and amplifier noise

PD-Pattern with noise during covered conductor (CC) fault:

image

De-Noising

  • the biggest permanent source of DSI on the measured site was radio transmitter “Solec Kujawski”. Its carrier wave (225 kHz) is clearly visible in almost all of acquired signals (see fig 1).
  • false hit peaks (most are corona discharges)
    • identified in a raw signal according to their position, shape, amplitude, and periodicity
    • reach a much higher amplitude than the PD-pattern
    • very often followed by another one with the opposite polarity, creating a symmetric pair
    • used this knowledge to cancel the false hit peak image

Univariate wavelet de-noising and peaks extraction

Perform basic univariate wavelet de-noising with:

Name of the parameter Experts setting SOMA’s range Description
maxDistance (ticks) 10 <4,10> distance that the symmetric peaks have to be within
maxHeightRatio (%) 0.25 <0.05,0.5> check if amplitudes of symmetric peaks exceeds this value)
maxHeight (%) 100 <80, 140> remove peaks with heights greater than this
maxTicksRemoval 500 <50,500> distance to remove after symmetric peaks
Threshold coef. 1 (0,5> used in univariate wavelet de-noising
Mother wavelet db4 all members of the wavelet families used in univariate wavelet de-noising
Level of decomposition 1 {1,...6} used in univariate wavelet de-noising

Remove Coronas and High Peaks

  1. For removing coronas, each peak is compared to the next peak. If their distance in the signal is under a defined limit maxDistance, check if the signs of the peaks are opposite and the ratio of their amplitudes is higher than the maxHeightRatio. Since the following oscillations can be misdetected as PD, then the peaks in the distance maxTicksRemoval behind the symmetric peak is cancelled.
  2. Remove peaks with higher amplitude than the defined limit maxHeight

Dealing with Class Imbalance

  • use a proper design of a representative subset (under-sampling method)
  • the subset should contain all the kinds of class labels equally distributed, because there are various signals in both classes with various amounts of background noise. The background noise can cover the PD-pattern or form some false hit peaks, which should also be recognized by final processing. This phenomenon has to be reasonably represented in the chosen subset.

Algorithm

  1. Extraction of relevant parts of the signal by performing univariate discrete wavelet transform (DWT) de-noising (this suppresses most of the irrelevant small peaks so only the most significant peaks remain)
  2. Describe each peak with its starting index, amplitude, width
  3. Remove coronas and high peaks
  4. Calculate features
  5. Classification by Random Forest

Features

28 feature columns and one class column for each signal:

  1. number of positive peaks
  2. number of negative peaks
  3. max width of peaks
  4. min width of peaks
  5. max amplitude of peaks
  6. min amplitude of peaks
  7. mean value of width of peaks
  8. mean value of amplitude of peaks
  9. four (one for each sinusoidal phase and one for all of them) added columns containing standard deviation of histograms of peaks positions, widths, and amplitudes

Subset Selection

Search for Subset selection in thesis

Definitions

Wavelet

  • a rapidly decaying wavelike oscillation that has 0 mean
  • transients are the jagged edges in a signal
  • each scaled wavelet is shifted in time along the signal and compared with the original signal
  • you can repeat this process for all the wavelet scales to get the coefficients as a function of wavelet scale and shift parameter
  • a signal with 1,000 samples x 20 scales = 20,000 coefficients
  • a wavelet with more vanishing moments is more complex
  • p vanishing moments -> polynomials up to the pth order will not be able to be identified by the wavelet

image

image

  • output of wavelet transform -> coefficients
  • when you scale a wavelet by a factor of 2, it decreases the frequency by half (an octave)

Mother Wavelet Examples:

image

Scaling

  • Refers to the process of stretching or shrinking the wavelet in time

image

image

  • scaling by 2 reduces the frequency by half or by an octave

image

  • A stretched wavelet would capture lower frequencies whereas a squished wavelet can capture higher frequencies

Discrete Wavelet Transform (DWT)

  1. Denoising
  2. Compression of signals and images
  • uses less coefficients (eliminates redundancy in coefficients)
  • yields the same number of coefficents as the length of the signal
  • scale: 2^j (j = 1,2,3,4,...)
  • translation: (2^j)*m (m = 1,2,3,4,...)
  • D1 and A1 filters can reconstruct "subbanks" and cancel out aliasing from downsampling

*dwt D - detail coefficients A - approximation coefficients

image

  • the length of the coefficients in each subbank is half the number of the coefficients in the previous stage

Continuous Wavelet Transform (CWT)

  1. Time-Frequency Analysis
  2. Filtering of time-localized frequency components
  • allows one to analyze the signal at intermediary scales between each octave (fine scale analysis)
  • these wavelets do NOT have negative frequency (easy for analysis)
  • needs more coefficients

Wavelet Denoising

  1. Perform a multi-level wavelet decomposition
  2. Identify a thresholding technique
  3. Threshold and reconstruct the signal

Fourier Transform

Does not represent abrupt transients efficiently, since it represents data as a sum of sine waves which are not localized in time or space (oscillate forever)

Choosing a Transform

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment