Research Article | Open Access | 19 May 2023

Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis

Views: 466 | Downloads: 201 | Cited:

Chun Liu^1,2

, ...

Chen Peng¹

Intell Robot 2023;3(2):131-43.

10.20517/ir.2023.07 | © The Author(s) 2023.

Author Information

Article Notes

Cite This Article

Abstract

At present, artificial intelligence is booming and has made major breakthroughs in fault diagnosis scenarios. However, the high diagnostic accuracy of most mainstream fault diagnosis methods must rely on sufficient data to train the diagnostic models. In addition, there is another assumption that needs to be satisfied: the consistency of training and test data distribution. When these prerequisites are not available, the effectiveness of the diagnosis model declines dramatically. To address this problem, we propose a semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis. To fully utilize the fault features implied in unlabeled data, pseudo-labels are generated through threshold filtering to obtain an initial pre-trained model. Then, a joint domain adaptation transfer network module based on conditional adversarial learning and distance metric is introduced to ensure the consistency of the distribution in two different domains. Lastly, in three groups of experiments with different settings: a single fault with variable load, a single fault with variable speed, and a mixed fault with variable speed and load, it was confirmed that our method can obtain competitive diagnostic performance.

Keywords

Fault diagnosis, joint adaptation transfer network, conditional adversarial learning, rotary machine

Download PDF 0 5

1. INTRODUCTION

The intelligent development of modern industrial technology leads to the gradual complexity and systematization of machinery and equipment^[1]. As essential equipment in modern industrial applications, rotary machines play a vital role in ensuring efficient and reliable operations. Key components, such as bearings and gears, are critical to the proper functioning of these machines, and any faults can disrupt the normal rotating mechanism. In engineering practice, bearings and gears are prone to faults due to improper assembly, corrosion, overload, poor lubrication, etc^[2]. If the equipment fault is not detected in time, it may affect the regular operation of the equipment and cause economic losses. In more serious cases, it may even put the lives of operators at risk. The early detection and prediction of bearing and gear faults in rotary machines will significantly enhance the safety of machinery production and avoid the loss of lives and property caused by mechanical faults. Based on the literature^{[3, 4]}, fault diagnosis methods for rotary machines are divided into two main categories: traditional fault diagnosis methods that rely on manual signal analysis and newer methods that use neural network diagnostic models to mine fault features.

For the past few years, deep learning techniques have made significant breakthroughs in artificial intelligence fields, and the advantages of automatically learning and extracting valid information from data are gaining increasing attention. By using sensors to acquire vibration signals and other relevant data and processing the data with deep learning algorithms to extract features that correspond to fault data, it becomes feasible to recognize and rectify potential faults^[5]. Unlike traditional fault diagnosis methods that use signal processing techniques combined with machine learning classifiers to perform fault diagnosis^[6], deep learning-based fault diagnosis models can automatically mine and analyze the underlying mechanisms of faults to obtain accurate fault classification performance with sufficient data^[7]. However, in practical engineering scenarios, mechanical equipment mainly operates normally, and failures are relatively rare. Therefore, the amount of fault data collected is usually limited. Furthermore, the distribution of data collected under changing operating conditions, such as speed, load, and surrounding environment of rotary machines, can vary considerably, which may affect the reliability and stability of diagnostic results^[8].

Transfer learning is a machine learning technique that allows for the transfer of knowledge learned from one task to another, with the aim of improving the performance of the latter task^[9]. In the context of diagnostic tasks, transfer learning allows for the simultaneous application of diagnostic knowledge learned from pre-trained data to relevant diagnostic tasks in order to achieve good diagnostic results^[10]. In this strategy, the core problem is distribution alignment, which enables the models to be constrained by the objective function so that it satisfies the assumption of distributional consistency to achieve good diagnostic results^[11]. Domain adaptation is the core technique for achieving distribution alignment. It essentially ensures that the feature spaces of the two tasks are aligned through some kind of transformation^[12]. In real-world scenarios, the feature space of the source and target tasks can vary greatly, and distance metric minimization is often utilized for alignment. Metrics for differences in distribution between domains include Kullback–Leibler (KL) divergence^[13], maximum mean difference (MMD)^[14], Wasserstein distance^[15], and CORAL loss^[16]. Additional loss measures are introduced into the loss function and then optimized by gradient descent. Notably, it is acknowledged that this strategy can obtain effective alignment with little difference in data distribution.

However, these methods mainly focus on aligning the marginal probability distributions, which only capture the variation of global characteristics and ignore differences in the conditional distribution probabilities in different domains. This makes it challenging to handle scenarios where the differences in data distribution between different domains are more complex. Based on the recent literature^[17–19], transfer learning fault diagnosis techniques are preferred by a wide range of researchers. Qian et al. use DenseNet as the baseline model, combined with a joint distribution adapted regularization term to get the metastable features. In this way, diagnostic capabilities are effectively migrated^[17]. Li et al. using the representational capabilities learned in supervised learning to obtain target domain feature representations by minimizing the multi-kernel maximum mean discrepancy (MKMMD) in different feature layers between the two domains^[18]. Wang et al. propose a method that uses multi-scale convolution to extract fault features while combining adversarial training to achieve effective migration effects. The effect of this method is close to 100% on the bearing dataset^[19]. The above-mentioned studies demonstrate the effectiveness of deep transfer learning in rotary machine fault diagnosis. However, there are still some problems that have not been taken seriously: (ⅰ) Most transfer learning methods only perform domain-adaptive alignment from a global perspective. This alignment effect is greatly reduced when the data distribution varies dramatically^[20]; (ⅱ) During the validation of transfer learning algorithms, the effectiveness of transfer effects for mixed fault types on different devices is rarely considered, which is quite difficult due to the significant differences in data distribution.

To address the aforementioned challenges, this paper proposes a semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis, which introduces the following main innovative aspects.

(1) To efficiently transfer the diagnostic power learned on a large amount of data in the source domain, a pre-trained model is trained on the labeled data in the source domain and then used to generate pseudo-labels for the unlabeled target domain data. This effectively utilizes unlabeled data to boost the performance of the diagnostic model. Then, to reduce domain shifts and align the joint distribution of the source and target domains, we take into account both the global feature variation and the intra-class similarity between different domains. This enables the alignment of both the conditional probability distributions and the marginal probability distributions in different domains. This method can effectively capture both the global and local differences between the two domains and align the distributions to reduce the domain shift. This can significantly improve the diagnostic performance on the target domain and enable the use of diagnostic models in real-world scenarios where the labeled data may be scarce.

(2) Considering the mutual influence between different devices of modern rotary machines, the difficulty of fault diagnosis is significantly increased. Our method can be used in single-type fault diagnosis and produce highly reliable results. More importantly, our method has shown great improvement in diagnostic tasks involving mixed fault types, which has led to more accurate diagnostic results.

The remaining sections of this paper are organized as follows: Section 2 provides an introduction to the related definitions of transfer learning. Section 3 elaborates on the proposed method in detail. In section 4, we present experimental results on three different types of settings to showcase the effectiveness of our method. Finally, section 5 summarizes the contributions of this work and discusses potential avenues for future research.

2. TRANSFER LEARNING PROBLEM

Having sufficient annotated data is a requirement for a well-performing supervised model; however, the process of annotating data can be tedious and time-consuming. Therefore, transfer learning is a proven way to make use of a previously pre-trained model on a new task while ensuring optimal performance. The main goal is to transfer the capabilities learned in the source domain data to the target domain data, thus solving the pain point that it is difficult to obtain sufficient knowledge in the target domain with limited data^[21].

From Figure 1, we can see that the traditional intelligent fault diagnosis method gives an accurate diagnosis in the case where the data distribution of the training and test sets is similar. Therefore, transfer learning is unnecessary in such cases. In general, when their data distributions are inconsistent, the generalization ability of the model is poor. In these situations, transfer learning can exploit the diagnostic power learned from the training data by reducing the difference between the two distributions.

Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis

Figure 1. Traditional intelligent methods and transfer learning-based intelligent methods.

Formally, we define $$ D^{s}=\left(x_{i}^{s}, y_{i}^{s}\right) $$ as labeled training data, $$ D^{t}=x_{i}^{t} $$ as unlabeled test data, where $$ s $$ denotes the source domain task, $$ t $$ denotes the target domain task, and $$ x_{i} $$ and $$ y_{i} $$ represent the vectorized representation of the $$ i $$th sample and the corresponding label. In addition, it is worth noting that the target domain task has no corresponding $$ y_{i}^{t} $$, which means that the available labeled data in the training phase can only rely on the labeled data in the source domain, which will increase the difficulty of transfer. Since there is a great difference between the task data in two different fields, transfer learning can minimize the difference between them by finding a mapping relationship, thus realizing the reusable diagnostic ability. When the data distributions of the two domains are close, we can satisfy the assumptions on which the existing intelligent faults depend and realize an effective diagnosis.

3. THE PROPOSED ARCHITECTURE

In order to efficiently transfer the diagnostic power learned from the labeled data, a pre-trained model is obtained by generating pseudo-labels for training. A domain adaptation network, using the joint maximum mean deviation (JMMD) criterion and conditional domain adversarial (CDA) learning, is then used to learn a mapping relationship that reduces the variation in the distribution of different domains. The joint distribution between the aligned features and the predicted labels is aligned through multiple domain adaptation approaches. Meanwhile, the information from the unlabeled data is incorporated in the pre-training phase, thus resulting in maximum category differentiation and domain adaptation under multimodal conditions.

As depicted in Figure 2, the primary architecture of the proposed method is structured as follows: First, enough labeled data in the source domain are collected to train a pre-trained model. After that, the unlabeled data in the target domain are predicted to obtain pseudo-labels. Then these data are combined to extract more effective fault features. Second, the feature vectors and label vectors are linearly transformed several times to jointly model the implied relationships between them. Finally, a domain adaptation module is used to align the differences between the two data through loss function optimization. The optimization objectives include the CDA loss, the label classification loss, and the JMMD loss, respectively, in order to perform a joint optimization training of the three components.

Figure 2. The structure illustration of the proposed rotary machine fault diagnosis method.

3.1. Pre-training

The pre-trained model structure using convolutional neural networks (CNN) with bi-directional long short-term memory (BILSTM). Detailed information on the model structure is given in Table 1. To accelerate computational efficiency, the raw signal is first downsampled and then fed into the CNN. After that, the features obtained from CNN are fed again into the BILSTM to better extract the temporal information of the vibration signal. A large kernel size = 15 for CNN/1 is used to get low-frequency information, while CNN/2, CNN/3, and CNN/4 extract high-frequency signals and, therefore, use a smaller kernel size = 3.

Table 1

The architecture of the one-dimensional CNN-BILSTM

Layer	Symbol domain	Operator	Parameter size
1	Input	Input Signal	1024
2	C1	Convolution	(16, kernel size = 15)
3	P1	Pooling	kernel size = 2
4	C2	Convolution	(32, kernel size = 3)
5	P2	Pooling	/
6	C3	Convolution	(64, kernel size = 3)
7	P3	Pooling	/
8	C4	Convolution	(128, kernel size = 3)
9	P4	AdaptiveMaxPooling	/
10	BILSTM	BILSTM	hidden_dim=64

In the pre-training phase, we first train with data labeled with the source domain. Unlabeled data with predicted probabilities above a threshold of 0.8 are filtered out and added to the training until convergence. Here, we simply rely on the empirical values of the task threshold above 0.8, which are relatively reliable pseudo-labels.

3.2. Domain adaptation

In order to achieve effective alignment, while the label classifier ensures the basic diagnostic ability, the domain classifier and a distance discrepancy metric module are additionally designed to further improve the effect. They correspond to the following three objective functions: (1) Minimize the classification loss of fault classification on the labeled data; (2) Maximize the domain classification error on two different domains. 3) Minimize the JMMD distance between the two dissimilar distributions.

3.2.1. Loss-function $$ L_{l} $$

To migrate the diagnostic capability to the target task, it is first necessary to ensure that the model has learned enough diagnostic knowledge in the source domain data. Thus, the first loss function $$ L_{l} $$ of our method is to minimize the classification loss of fault classification on the labeled data. The required objective function $$ L_{l} $$ for data with $$ k $$ fault classes is the standard softmax loss function.

(1)

$$ L_l=-\frac{1}{n}\left[\sum\limits_{i=1}^n \sum\limits_{j=1}^k I\left[y_i=k\right] \log \frac{e^{\left(\left(w_j\right)^T x+b\right)}}{\sum\nolimits_{l=1}^k e^{\left(\left(w_l\right)^T x+b\right)}}\right] $$

where $$ n $$ is the batch size and $$ k $$ is the number of fault classes.

3.2.2. Loss-function $$ L_{\mathrm{d}} $$

The primary role of the domain adaptation module is to guide the network to extract domain invariant features under the constraint of the loss function. Borrowing ideas from generative adversarial networks, an adversarial domain-based training approach is added to learn the domain-invariant features. By setting a gradient reverse layer (GRL) in front of the domain classifier, the target domain data is confounded with the source domain data, thus maximizing the classification loss between the two domains. The domain classifier and feature extractor struggle with each other and finally reach a balance. Thus, domain-invariant features are learned. However, if we just align the marginal distribution between two data and ignore the correlation between labels and features, the final alignment results are poor. The conditional domain adversarial network is used to capture the cross-covariance between features and labels, thus improving the discrimination^[22]. Considering the non-linear and non-smooth nature of fault signals, the joint distributions of fault features and corresponding labels need to be aligned as closely as possible to effectively transfer the diagnostic capability. Therefore, we train CDA as a second objective function here. Subsequently, the loss function $$ L_{d} $$ is shown below.

(2)

$$ w(H(p))=1+e^{-H(p)}, H(p)=-\sum\limits_{k=0}^{k-1} p_{k} \log p_{k} $$

(3)

$$ \begin{equation} \begin{aligned} L_{\mathrm{d}}=&-\frac{1}{n_{s}} w\left(H\left(p_{i}^{s}\right)\right) \sum\limits_{i=1}^{n_{s}} \log \left[1-D\left(F\left(x_{i}^{s} ; \theta_{f}\right) ; \theta_{d}\right)\right] \\ &-\frac{1}{n_{t}} w\left(H\left(p_{i}^{t}\right)\right) \sum\limits_{i=1}^{n_{t}} \log \left[D\left(F\left(x_{j}^{t} ; \theta_{f}\right) ; \theta_{d}\right)\right] \end{aligned} \end{equation} $$

where $$ \theta_{f} $$ is the model parameter corresponding to the feature extraction module, $$ \theta_{d} $$ is the parameter of the domain classifier, and $$ k $$ denotes the number of fault types, $$ H(p) $$ denotes the uncertainty of the sample classification result, and $$ w(H(p)) $$ denotes the weight of each sample.

3.2.3. Loss-function $$ L_{\mathrm{D}} $$

Compared with the CDA method, spatial metric distance minimization is another approach to learning domain invariant features. The MMD method is used by Borgwardt et. al^[23] to measure the variability of distributions. However, the effectiveness of aligning different distributions with MMD in complex multimodal conditions is limited. To address this problem, Long et al.^[24] proposes the JMMD method to de-align the joint distribution in the feature space and label space, where the loss function $$ L_{D} $$ is defined as

(4)

$$ L_{\mathrm{D}}=\left\|\mathbb{E}_S\left(z^{s f} \otimes z^{s l}\right)-\mathbb{E}_{\mathrm{T}}\left(z^{t f} \otimes z^{t l}\right)\right\| $$

where $$ z^{s f} $$ and $$ z^{t f} $$ represent the output of the fault feature, and $$ z^{s l} $$ and $$ z^{t l} $$ denote the vector representation of label. Unlike the standard JMMD, we add $$ f \otimes l $$ to align the joint distribution of two domains, $$ f \otimes l $$ refers to introducing two learnable weight matrices, $$ w_1 $$ and $$ w_2 $$, to unify $$ f $$ and $$ l $$ into the same dimension and add them together to represent the joint distribution of features and labels.

4. EXPERIMENTAL VERIFICATION

In this section, the proposed semi-supervised joint adaptation transfer network with adversarial learning is evaluated by examining vibration signal data from different rotary machine types, such as motor bearings, wind turbine bearings, and gearbox bearings and gears. The three datasets were used to evaluate the diagnostic capability of our method under different loads, speeds, and mixed fault-type scenarios. We conducted comparative experiments across multiple tasks using six existing transfer methods and analyzed the diagnostic effectiveness of no migration. We then demonstrate that our proposed semi-supervised method exhibits good diagnostic capability. This plays a crucial role in situations where obtaining fault data is difficult.

4.1. Case 1: CWRU bearing datasets under different loads

4.1.1. Data description

In this case, the bearing dataset is from the CWRU laboratory^[25]. The experimental setup mainly consists of a dependent motor, a torque sensor/encoder, and a load motor. The bearing dataset is collected at four loads (0, 1, 2, and 3 HP). Single point faults are arranged on the bearings using electrical discharge machining (EDM) to simulate inner race faults (IF), rolling element faults (RF), and outer race faults (OF). Twelve transfer tasks are designed by migrating between the four load states. In addition, 1000 samples of length 1024 are provided for each data type. The sampling rate for our task is selected as 12 kHz. To obtain the diagnostic model, 80% of the data is used, while 20% is used to verify its validity.

A dataset of bearings with variable load conditions from the CWRU laboratory is applied to illustrate that the model could accurately classify fault types. To assess the diagnostic capability of the model, comparative tests with some commonly used domain adaptation algorithms such as MKMMD, CORAL, JMMD, domain adversarial (DA), and CDA are performed. In our article, average accuracy is a key indicator to evaluate the diagnosis results of different methods.

Experimental results and analysis

The comparative results of the eight different methods on the bearing dataset are shown in Table 2. Our method is still the best performer among the eight methods on this dataset, with an average accuracy of 100% for 9 out of the 12 migration tasks. Some other domain adaptation methods, including JMMD, have also achieved positive results, probably because this dataset is relatively simple and the differences in the distribution are relatively small. However, the effectiveness of many such methods remains unclear. For the CWRU dataset, it is evident that the diagnostic results of the various methods are good. This is mainly due to the fact that the faults in this dataset are artificially set and have a more pronounced fault signature. In addition, the relatively small differences in the distribution of the bearing datasets collected under different loads reduce the difficulty of the migration task. In some migration tasks, such as 3-0, 3-1, and 3-2, the accuracy is only around 85%. It further implies that directly using a pre-trained model from the source domain task to predict the target domain data still produces significant errors. Thus, it is also demonstrated that this domain migration strategy is still essential. These comparative experiments provide a preliminary validation of the effectiveness of our proposed method.

Table 2

The accuracy of different domain adaptation methods in CWRU datasets(%)

Method	No-TL	AdaBN	MKMMD	CORAL	JMMD	DA	CDA	OURS
Task 0-1	98.77	99.68	100	98.38	99.68	99.35	100	100
Task 0-2	96.49	99.48	96.43	100	100	99.03	99.68	100
Task 0-3	94.43	98.38	92.88	100	99.03	99.35	92.56	99.68
Task 1-0	97.55	95.40	98.08	98.85	100	99.23	97.32	100
Task 1-2	98.70	99.87	100	98.35	100	99.03	100	100
Task 1-3	94.82	99.03	98.71	99.68	100	99.68	100	100
Task 2-0	96.02	94.64	98.47	96.55	97.32	98.47	95.79	99.23
Task 2-1	98.18	99.29	98.05	97.08	100	99.03	96.43	100
Task 2-3	98.77	99.22	100	99.35	100	99.03	99.68	100
Task 3-0	87.82	90.04	84.67	99.23	98.08	95.79	97.70	98.85
Task 3-1	88.56	93.18	92.86	99.35	98.38	90.26	95.45	100
Task 3-2	87.98	95.32	96.10	100	99.68	97.40	98.70	100

4.2. Case 2: JNU bearing datasets under different speeds

4.2.1. Data description

In this case, the bearing dataset is obtained from the Jiangnan University (JNU) laboratory ^[26]. Vibration signals are collected from wind turbine bearings at three speeds of 600, 800, and 1000 rpm, including normal bearings, rolling element failures, outer ring failures, and inner ring failures. These four faults are simulated by hand machining tiny scars on the inner ring, outer ring, and rolling element of the bearing by wire cutting. The scar size of the bearing faults is 0.25 mm $$ \times $$ 0.7 mm. The total length of the collected data is 2,000,000, and the amount of data for each state is 500,000.

In this experiment, we labeled the settings at three different speeds of 600, 800, and 1000 rpm as tasks 0, 1, and 2, respectively. We designed a total of six transfer tasks by combining the vibration signals of the three states in a two-by-two manner. We cut the length of each sample to 1024 and then performed a comparative test with some commonly used domain adaptation algorithms, such as MKMMD, CORAL, JMMD, DA, and CDA. Again, average accuracy is a key assessment metric.

4.2.2. Experimental results and analysis

The comparative results of the eight different methods on the bearing dataset are shown in Table 3. The comparative results for the six transfer tasks under different speed conditions validate that our method still outperforms the other seven methods on this dataset, achieving the best average accuracy in five of the six transfer tasks. The result that no method could achieve 100% accuracy on this dataset implies that there are still some discrepancies in the data distribution related to this task. Other domain adaptation methods, such as JMMD, have also been found to give perfect results. These comparative experiments demonstrate the adaptability of our approach to variations in this domain in different speed scenarios.

Table 3

The accuracy of different domain adaptation methods in JNU datasets (%)

Method	No-TL	AdaBN	MKMMD	CORAL	JMMD	DA	CDA	OURS
Task 0-1	97.44	96.96	97.27	89.08	98.12	96.08	94.54	99.15
Task 0-2	91.60	96.55	97.61	88.91	97.95	95.39	96.25	98.63
Task 1-0	85.46	91.88	88.57	88.40	96.93	83.62	86.69	94.88
Task 1-2	97.41	97.51	98.29	97.61	98.29	94.54	97.27	99.32
Task 2-0	85.02	91.16	98.46	97.44	98.81	94.54	97.10	99.49
Task 2-1	97.95	97.68	97.61	96.25	98.63	96.08	97.95	99.49

Figure 3 illustrates the details of the four best diagnostic results using the confusion matrices. From the visualization of the confusion matrix, we know that JMMD, CDA, and MKMMD have a larger classification error on the fourth fault type. It can be attributed to the fact that the data distribution of the failure types varies considerably under different speed conditions. The diagnostic effectiveness of a single strategy is quite limited. By using a joint domain adaptation migration network to de-target the alignment to reduce the joint distribution differences between two different domains, the accuracy of our proposed method in this fault type has been dramatically improved. At the same time, a conditional confrontation training module was introduced to help improve the alignment effect to deal with domain drift. Finally, the most significant differences between the different categories were obtained. The above-mentioned results provide sufficient evidence of the transferability of our proposed fault diagnosis method.

Figure 3. Confusion Matrix of four different methods on gearbox dataset.

4.3. Case 3: SEU gearbox datasets with mixed fault

4.3.1. Data description

We use the bearing and gearbox dataset from Southeast University in China in this experiment^[27]. The experimental platform, DDS, consists mainly of a motor, a planetary gearbox, and a parallel gearbox. The fault signals are obtained under two different working conditions, 20Hz-0V and 30Hz-2V. The dataset for the gearbox includes the fault signal of the planetary gearbox in the $$ X, Y $$, and $$ Z $$ directions. There are four types of faults: broken teeth, missing teeth, root faults, and surface faults, and one normal type for healthy working conditions. The bearing data are available for four types of faults: inner ring, outer ring, rolling element, and mixed inner and outer rings. In order to evaluate the performance of our approach when dealing with mixed fault types, gear and bearing fault data from the SEU dataset were combined into a mixed dataset. There are nine fault types in this mixed dataset, including four gear faults, four bearing faults, and one normal data. There are 1000 samples for each data type, and each sample is 1024 in length. Thus, this dataset consists of 9,000 data samples. Finally, we use 80% of the data to obtain the diagnostic model and 20% of the data to verify its effectiveness.

In this experiment, to demonstrate the transfer effectiveness of the proposed method under different load and velocity operating conditions, we collected vibration signals for two different states, 20Hz-0V and 30Hz-2V, and named Task 0 and Task 1, respectively. We validated the model by combining the vibration signals for the two states in a two-by-two fashion. In addition, two additional different signal forms were set up, with both time and frequency domain signals considered as inputs, and a total of four different transfer tasks were designed to validate the model. In order to evaluate the performance of our method in this case of widely varying data distributions with different load and speed conditions, comparative tests were carried out with some commonly used domain adaptation algorithms, such as MKMMD, CORAL, JMMD, DA, and CDA. In this case, we also choose average accuracy as a key assessment metric.

4.3.2. Experimental results and analysis

The comparative results of the eight different methods on the bearing dataset are shown in Table 4. The results of the four transfer tasks under different load and speed conditions show that our method still performs the best of the eight different methods on this dataset, with the best average accuracy in all four transfer tasks. It must be noted that a high level of accuracy is not achieved on this dataset, and it is evident that there are significant differences in the data distribution between the two domains on this task. The main reason lies that the vibration signals collected at different speeds and loads are inherently different. In addition, there are a number of mixed fault types in this task, such as mixed inner and outer ring faults and both bearing and gearbox faults, which can affect the final transfer results. It is worth noting that the JMMD method, which performs quite effectively in the first two tasks, differs from the best results by around 7-8% on this task. Since the data distribution is complex and varies significantly, domain adaptation strategies alone are not sufficient to align the distribution well enough to achieve good diagnostic performance.

Table 4

The accuracy of different domain adaptation methods in SEU datasets (%)

Method	No-TL	AdaBN	MKMMD	CORAL	JMMD	DA	CDA	OURS
Task 0-1(TD)	45.43	49.21	59.97	50.59	65.40	54.40	59.53	75.95
Task 1-0(TD)	56.16	57.89	67.45	58.44	68.62	58.80	65.54	72.87
Task 0-1(FD)	35.19	41.38	44.57	42.52	45.45	43.70	43.26	50.15
Task 1-0(FD)	42.99	49.53	44.28	51.17	61.29	53.96	52.93	62.90

On the one hand, domain adaptation is performed at the feature extraction and classification layers via JMMD by exploiting the differences in the joint distribution. On the other hand, adversarial domain training is performed by adjusting the joint distribution to reduce domain drift. These two modules achieve maximum category differentiation and domain adaptation in multimodal conditions. Finally, the advantages and disadvantages of the diagnostic approaches are verified in two cases: using the original time domain signal directly as inputs versus transforming the data into the frequency domain and using that as inputs. It turns out that in this task, the time domain signal is used directly as input to obtain better diagnostic results. The reason for this phenomenon may be that the time-domain representation is more capable of intuitively reflecting the amplitude, frequency, and phase information of the signal over time and can better display the waveform shape of the signal, which is very helpful for detecting short-term signal changes and analyzing signal shape. Additionally, frequency domain representation mainly provides information about the frequency components and relative strengths of the signal but may not be able to fully reflect all the information about the signal, especially if the signal is very complex or contains multiple frequencies. In addition, the interpretation of frequency domain representation may be more difficult to understand and may require higher levels of professional knowledge for analysis. Although frequency domain representation can provide valuable information about the frequency components of the signal, it may not be as effective in capturing the complex time characteristics of the signal. Therefore, time domain representation is more prominent in terms of intuitiveness and practicality. Through these comparative tests, it is demonstrated that the transfer effects of our proposed method are practical for different speed scenarios.

In summary, the improvement in diagnostic performance achieved by our method can be attributed to the combination of JMMD and adversarial domain training modules, which effectively address the challenges of domain adaptation in multimodal conditions. Additionally, the use of the time domain signal as input also contributes to the improvement in diagnostic performance.

CONCLUSION

This paper proposes a novel semi-supervised joint adaptation transfer network with conditional adversarial learning for fault diagnosis of the rotary machine, which can effectively solve the problem of poor diagnosis due to insufficient data in the target domain. The proposed fault diagnosis method first incorporates information from unlabeled target domain data by introducing a pre-trained model. Two domain adaptation modules are then used to close the distance between the distributions of different domains, thereby improving the effectiveness of the diagnostics of mutual migrations in the two different domains. Ultimately, our approach is validated to achieve reliable results for variable loads, variable speeds, and mixed fault-type diagnostic tasks in three different experimental settings. However, the method we proposed has not been validated using fault data obtained from real scenarios, where the fault patterns are typically more complex, and the data often contains a significant amount of noise. As a result, there is a possibility that the performance of this method could be affected.

In this work, we focus more on domain adaptation between data in different domains so that pseudo-labels use only empirical thresholds to filter reliable labels. In future investigations, we will focus on how to filter for more reliable pseudo-labels in order to make the best possible use of unlabeled data and further improve the diagnosis of tasks with insufficient labeling data.

DECLARATIONS

Authors' contributions

Made substantial contributions to the conception and design of the study and performed data analysis and interpretation: Liu C, Li S

Performed data acquisition and provided administrative, technical, and material support: Chen H, Xiu X, Peng C

Availability of data and materials

CWRU:^[25] JNU:^[26] SEU:^[27]

Financial support and sponsorship

This work was supported by the National Natural Science Foundation of China (62103250, 62273223, and 62173218); Shanghai Sailing Program (21YF1414000); Project of Science and Technology Commission of Shanghai Municipality, China (22JC1401401).

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

REFERENCES

1. Xu X, Cao D, Zhou Y, Gao J. Application of neural network algorithm in fault diagnosis of mechanical intelligence. Mech Syst Signal Pr 2020;141:106625.

2. Qiao W, Lu D. A survey on wind turbine condition monitoring and fault diagnosis—part Ⅰ: components and subsystems. IEEE Trans Ind Electron 2015;62:6536-45.

3. Hoang DT, Kang HJ. A survey on Deep Learning based bearing fault diagnosis. Neurocomputing 2019;335:327-35.

4. Lei Y, Yang B, Jiang X, et al. Applications of machine learning to machine fault diagnosis: a review and roadmap. Mech Syst Signal Pr 2020;138:106587.

5. Tran MQ, Amer M, Dababat A, Abdelaziz AY, Dai HJ, et al. Robust fault recognition and correction scheme for induction motors using an effective IoT with deep learning approach. Measurement 2023;207:112398.

6. Gong W, Chen H, Zhang Z, et al. A novel Deep Learning method for intelligent fault diagnosis of rotating machinery based on improved CNN-SVM and multichannel data fusion. Sensors 2019;19:1693.

7. Pandey SK, Janghel RR. Recent Deep Learning techniques, challenges and its applications for medical healthcare system: a review. Neural Process Lett 2019;50:1907-35.

8. Liu R, Yang B, Zio E, Chen X. Artificial intelligence for fault diagnosis of rotating machinery: a review. Mech Syst Signal Pr 2018;108:33-47.

9. Zhuang F, Qi Z, Duan K, et al. A comprehensive survey on transfer learning. Proceedings of the IEEE 2021;109:43-76.

10. Qian C, Zhu J, Shen Y, Jiang Q, Zhang Q. Deep transfer learning in mechanical intelligent fault diagnosis: application and challenge. Neural Process Lett 2022;54:2509-31.

11. Yang X, Chi F, Shao S, Zhang Q. Bearing fault diagnosis under variable working conditions based on deep residual shrinkage networks and transfer learning. J Sensors 2021;2021:1-13.

12. Kouw WM, Loog M. A review of domain adaptation without target labels. IEEE Trans Pattern Anal Mach Intell 2021;43:766-85.

13. Hershey JR, Olsen PA. Approximating the kullback leibler divergence between gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07. IEEE; 2007.

14. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv: 14123474 2014.

15. Shen J, Qu Y, Zhang W, Yu Y. Wasserstein Distance Guided Representation Learning for Domain Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence 2018; doi: 10.1609/aaai.v32i1.11784.

16. Sun B, Saenko K. Deep CORAL: correlation alignment for deep domain adaptation. In: Lecture Notes in Computer Science. Springer International Publishing; 2016. pp. 443–50.

17. Qian C, Jiang Q, Shen Y, Huo C, Zhang Q. An intelligent fault diagnosis method for rolling bearings based on feature transfer with improved DenseNet and joint distribution adaptation. Meas Sci Technol 2021;33:025101.

18. Li X, Zhang W, Ding Q, Sun JQ. Multi-Layer domain adaptation method for rolling bearing fault diagnosis. Signal Process 2019;157:180-97.

19. Wang Y, Ning D, Lu J. A novel transfer capsule network based on domain-adversarial training for fault diagnosis. Neural Process Lett 2022;54:4171-88.

20. Li W, Huang R, Li J, et al. A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios: Theories, applications and challenges. Mech Syst Signal Process 2022;167:108487.

21. Yao S, Kang Q, Zhou M, Rawa MJ, Abusorrah A. A survey of transfer learning for machinery diagnostics and prognostics. Artificia Intell Rev 2022;56:2871-922.

22. Long M, CAO Z, Wang J, Jordan MI. Conditional adversarial domain adaptation. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, et al., editors. Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. Available from: https://proceedings.neurips.cc/paper_files/paper/2018/file/ab88b15733f543179858600245108dd8-Paper.pdf.

23. Borgwardt KM, Gretton A, Rasch MJ, et al. Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 2006;22:e49-57.

24. Long M, Zhu H, Wang J, Jordan MI. Deep transfer learning with joint adaptation networks. In: International conference on machine learning. PMLR; 2017. pp. 2208–17.

25. “Case Western Reserve University Bearing Data Center Website”; . https://engineering.case.edu/bearingdatacenter.

25. Case WesLi K, Ping X, Wang H, Chen P, Cao Y. Sequential fuzzy diagnosis method for motor roller bearing in variable operating conditions based on vibration analysis. Sensors 2013;13:8013-41. https://doi.org/10.3390/s130608013.

27. Shao S, McAleer S, Yan R, Baldi P. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans Indus Inf 2019;15:2446-55.

Cite This Article

Export citation file: BibTeX | RIS

OAE Style

Liu C, Li S, Chen H, Xiu X, Peng C. Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis. Intell Robot 2023;3(2):131-43. http://dx.doi.org/10.20517/ir.2023.07

AMA Style

Liu C, Li S, Chen H, Xiu X, Peng C. Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis. Intelligence & Robotics. 2023; 3(2): 131-43. http://dx.doi.org/10.20517/ir.2023.07

Chicago/Turabian Style

Liu, Chun, Shaojie Li, Hongtian Chen, Xianchao Xiu, Chen Peng. 2023. "Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis" Intelligence & Robotics. 3, no.2: 131-43. http://dx.doi.org/10.20517/ir.2023.07

ACS Style

Liu, C.; Li S.; Chen H.; Xiu X.; Peng C. Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis. Intell. Robot. 2023, 3, 131-43. http://dx.doi.org/10.20517/ir.2023.07

About This Article

Special Issue

This article belongs to the Special Issue Intelligence, Optimization, and Safety for Complex Systems

Copyright

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

466

Downloads

201

Citations

Comments

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.

⁰

Download PDF

Download XML 9 downloads

Cite This Article 13 clicks

Export Citation 34 clicks

Like This Article 5 likes

Share This Article

https://www.oaepublish.com/articles/ir.2023.07

Scan the QR code for reading!

See Updates

Contents

Figures

Semi-supervised joint adaptation transfer network with conditional adversarial learning for rotary machine fault diagnosis

Abstract

Keywords

1. INTRODUCTION

2. TRANSFER LEARNING PROBLEM

3. THE PROPOSED ARCHITECTURE

3.1. Pre-training

3.2. Domain adaptation

3.2.1. Loss-function $$ L_{l} $$

3.2.2. Loss-function $$ L_{\mathrm{d}} $$

3.2.3. Loss-function $$ L_{\mathrm{D}} $$

4. EXPERIMENTAL VERIFICATION

4.1. Case 1: CWRU bearing datasets under different loads

4.1.1. Data description

Experimental results and analysis

4.2. Case 2: JNU bearing datasets under different speeds

4.2.1. Data description

4.2.2. Experimental results and analysis

4.3. Case 3: SEU gearbox datasets with mixed fault

4.3.1. Data description

4.3.2. Experimental results and analysis

CONCLUSION

DECLARATIONS

Authors' contributions

Availability of data and materials

Financial support and sponsorship

Conflicts of interest

Ethical approval and consent to participate

Consent for publication

Copyright

REFERENCES

Cite This Article

About This Article

Special Issue

Copyright

Data & Comments

Data

Comments

Share This Article

See Updates

Committee on Publication Ethics

Portico

Committee on Publication Ethics

Portico