Abstract: Due to the working relationship, there are opportunities to receive more than 30 thermal power plants to receive capital, exchange or acceptance, and access to nearly 100 units of 100 to 700 MW unit units using DCS, covering almost all types of DCS applied in China. We have a better understanding of the failures of various types of DCS. Whether they are imported DCS or domestic DCS, although they differ in principle and structure, and contain as many subsystems, they all appear more or less. Some similar faults, through detailed analysis of typical faults, to find out the real cause of the fault, and then to formulate preventive measures and implement them correctly, can prevent such DCS faults from recurring. This article lists several typical DCS fault cases for reference in thermal technology management and maintenance personnel.
I. INTRODUCTION The application of DCS to domestic large-scale thermal power generating units started in the late 1980s. So far, only a dozen years of operating experience have been used. Huaneng Power International Co., Ltd. introduced the entire 350MW unit, and the Nantong, Shang'an, Dalian and Fuzhou power plants invested and constructed were the first DCS power plants in China.
With the continuous improvement of the automation level of thermal power generating units, the range of functions of the DCS system of unit units has been continuously expanded. In the past two years, the unit control room for newly-built and rebuilt units was used for emergency shutdown and standby operation. All other operations depended on DCS. Therefore, the tripping phenomenon caused by the failure of the DCS itself occurs from time to time. Therefore, how to improve the reliability of the DCS is an important topic for everyone involved in thermal automation.
II. Case 1 Controller restarts causing unit to trip 2.1 Events After November 1, 2001, the active load of Unit 4 of A Power Plant was 270 MW before shutdown. Reactive 96MVar, A and B excitation regulators were automatically operated in parallel, and manual 50Hz cabinet tracking was used. .
At 14:26, an accident sound was sent out, the generator outlet switch, the excitation switch tripped, the “regulator A tank exiting operationâ€, and the “regulator B cabinet exiting operation†and other alarm signals were issued and the unit was disconnected. Checked and tested the ECS control system and found that the #14 controller was out of order and the redundant #34 controller was restarted. After replacing the #14 and #34 controller main board, the unit was restarted. Soon, the Change group and system side by side.
2.2 Cause Analysis According to the analysis of the historical data, at 13:31, the #14 controller hardware was running offline and the hot standby #34 controller was automatically controlled by the auxiliary controller. At 14:26, the #34 controller caused a misjudgment of "WATCHDOG" due to a communication jam, causing the controller to restart. Because the controller controls the excitation regulator in a long signal mode, there is no breakpoint protection function. After the #34 controller restarts, it cannot automatically return to the state before the breakpoint, causing the A and B regulators to automatically exit the operation. The manual 50Hz cabinet is automatically Input. As the generator is demagnetized, the voltage at the generator terminal drops, causing the factory power supply voltage to drop. The manual 50Hz cabinet output voltage continues to decrease. After the manual 50Hz cabinet is put in, the generator does not escape from the demagnetization state until the excitation device is cut off and the generator is demagnetized. Protection action, the generator outlet switch trips.
The #14 controller and the #34 controller control the generator group equipment, including the standby relay BK connected to the factory power switch, #34 the controller restarts, the BK resets automatically, the relay contact opens, and the BK enters the exit position. Caused by 6KV power switch 6410, 6420 switch failed to vote.
2.3 Precautions 2.3.1 Replace the faulty controller. Later, the manufacturer confirmed that there was a problem with the crystals of the motherboards and agreed to replace the motherboards for free with the opportunity to replace all the controller boards on the 4th unit.
2.3.2 Increase the off-line alarm function of any controller, I/O card, and communication card.
2.3.3 The time of setting "WATCHDOG" in the program is too short, which may cause misjudgment and software upgrades for all controllers.
2.3.4 Regulator AQK, BQK mode switch and factory power supply self-switching BK switch configuration diagram Add the break point protection function to prevent the excitation regulator and factory power self-casting switch from running after the controller is started.
2.3.5 Check all configurations of the ECS system and modify the logic that has the above problems.
2.3.6 Contact the regulator manufacturer so that the regulator can be self-maintained in the operating state and change the way the controller controls the regulator to short pulse signal control.
2.6.7 Increase the manual 50Hz cabinet output voltage auto tracking function in the ECS.
III. Case 2 Online transmission of code caused the unit to be untied 3.1 Event On July 12, 2002, the monitoring staff of Unit #5 of B Power Plant found that the load of the unit rapidly dropped from 552MW, the main steam pressure suddenly rose, and the steam turbine was adjusted to open the door. The 20% is closed to 10% and continues to shut down, the high-profile door continues to quickly close to 0%, the unit load is reduced to 5MW, and the operator is forced to manually stop the emergency, the turbine trips, and the generator is disconnected.
3.2 Reasons Analysis DCS and turbine control systems are manufactured by two foreign companies, respectively. The two systems are quite different. Communication problems are not well resolved, and there are some defects that are difficult to eliminate. When the thermal control personnel transmits a communication code from the DCS engineer station to the PLC responsible for communication between the DCS and the turbine control system, the DCS modifies the valve position limit of the turbine from 120% during normal operation to 0.25%, causing the turbines 1, 2, and 3 to adjust the door. From 20% to 0%, the unit load quickly drops from 552 MW to 5 MW.
3.3 Precautionary Measures 3.3.1 During operation of the unit, DCS transmission of code is prohibited.
3.3.2 During the outage period of the unit, when the DCS transmits the code, it shall be agreed by the running monitor and safety measures shall be taken.
3.3.2 The function of operating the operator interface of the steam turbine control system by the DCS operator station is blocked, but the turbine control system information can still be monitored at the DCS operator station.
IV. Case III DCS Workstation Clock Chaos Causes Failure of DCS 4.1 Events After August 3, 2001, Unit C of Unit C was loaded with 200MW, #1 to #9 controllers were in control mode, and #51 to #59 controllers were in standby mode. . At 8:23, each controller sends NTP alarms in sequence. The historical station alarm window is as follows:
Aug308:23:50 drop7<7>NTP: toomanyrecvbufsallocated(30)
Aug308:23:50 drop4<7>NTP: toomanyrecvbufsallocated(30)
.........
At 8:26, the #2 controller was disconnected from the network and the #52 controller was switched off as the master; at 11:05, the #52 controller was off-net; at 13:39, the #7 controller was off-net and the #57 controller was switched off. For the main control, at the moment when #7 controller switches to #57 controller, the A and B coal mills controlled by the controller trip; at 15:11, the #9 controller goes offline and the #59 controller cuts Control, at the instant of switching from #9 controller to #59 controller, the E coal mill controlled by the controller trips; at 15:51, #1 controller is off the grid, and the #51 controller cuts the main control, in # 1 When the controller switches to the #51 controller, the A wind turbine leaf controlled by the controller is forcibly closed.
15:22, restart operator station drop213 (backup clock station), NTP alarm has not disappeared; 15:35, restart the historical station, NTP alarm has not disappeared; 15:59, restart the engineer station (master clock station), NTP The alarm disappeared; at 16:09, the historic station was restarted. At 16:30, the system returned to normal.
4.2 Cause Analysis The role of the NTP software is to maintain the unity of the network clock. The master clock is set on the engineer station and the backup clock is set on the operator station. Controller disconnection Causes the system clock to be out of sync due to the non-synchronization of the main clock and the standby clock, causing the NTP alarm to cause the controller to disconnect the network.
There are two possible reasons for NTP failure. One is a workstation with a main frequency of 400 MHz, which is different from the 270 MHz of Unit 1 (SUN has a major improvement on the operating system on a 400 MHz workstation). The workstation version 1.1 is used by Unit 2. It has not been tested on 400MHz workstations and it cannot be guaranteed that the 1.1 version of the software will not cause problems in this configuration. The other is that the master clock is not synchronized with the backup clock. After the controller was disconnected from the network on August 3, it was found that the Drop 214 clock was 2 seconds faster than the other stations. When the Drop 214 screen was called slowly, it was normal after restart. In addition, the NTP clock alarm occurred only about 73-75 days after the system operation. It is estimated that the system clock deviation has accumulated to a certain degree, causing the main and standby clocks to be out of synch, causing the system clock to be disordered, and eventually causing the controller to disconnect the network.
The failure of the NTP clock causes the controller to be disconnected from the network. Failure to handle it in time can cause the alarm controller to be disconnected from the network in turn, thus causing the entire control system to paralyze.
4.3 Precautionary Measures 4.3.1 Based on this failure phenomenon, the manufacturer upgraded the software from version 1.1 to version 1.2.
4.3.2 To ensure the reliable operation of the control system, restart the main clock and standby clock station periodically.
In the 4.4D power plant unit 5, the DCS clock was not synchronized with the GPS clock during trial operation in 2002, causing a failure of the DCS operator station. Since the data transmitted on the Internet is time-tagged, the disruption of the clock will have serious consequences for the running crew. The basic situation is similar to that of the C plant. The measure taken is to temporarily disconnect the GPS clock. After the software is upgraded and the problem is solved, the GPS clock is restored.
V. Case 4 CABLETRON HUB malfunctions due to failure of the main communication board 5.1 Events After January 1, 2002, Unit 1 of the E Power Plant has a load of 250MW. The #51 to #59 controllers are in control mode. #1 to #9 controllers In standby mode, mills A, B, C, E, and F operate. At 18:57, all pulverizers tripped (direct-fired furnaces), the MFT moved and the unit tripped.
5.2 Cause Analysis After analysis, it was confirmed that the total communication board failure of the DCS hub caused all the controllers connected to it to switch at the same time. During the process of the controller switching to the standby controller, #57, #58, and #59 controllers The PK key signal is mis-distributed (these three controllers are FSSS systems), that is, the tripping and confirmation commands of the “grinding machine trip button†on the CRT are issued at the same time, causing all coal mills to trip, causing the MFT to operate.
5.3 Precautionary measures CABLETRON hubs are early products and it is difficult to purchase spare parts in the market. CISCO hubs are used instead of CABLETRON hubs.
VI. Case 5 Failure of Redundant Controller Caused Unit Trip 6.1 Incident After March 23, 2003, the power load of Unit #3 in the F power plant was 115 MW, the main steam pressure at the furnace side was 9.55 MPa, and the main steam temperature was 537°C. The main water supply was regulated. The door opening is 43%, and the bypass water supply adjustment door opening is 47% (each water supply pipe can satisfy the 100% load water supply), and the water level of the steam drum is normal; there are no abnormal changes in other parameters.
The supervisory staff found that the parameters on the side of the boiler were abnormal, and all operations could not be carried out. At the same time, the CRT screen on the furnace side showed that all items were automatically released. The self-test screen shows that the #3 controller is offline and the #23 controller is in the master state. The operating personnel immediately contacted the thermal workers to deal with the main steam pressure and main steam temperature at the same time with the help of the turbine side CRT screen, and reinforced monitoring of the drum electric contact level gauge and water level TV. The main steam pressure fluctuates from 9.0 to 9.6 MPa and the main steam The temperature fluctuates at 510-540°C and the drum water level fluctuates at +75--50mm to maintain operation.
A few minutes later, the hot workers rushed to the scene and found that the #3 controller was offline and the #23 controller was the main control state, but the #23 controller controlled I/O points (drum water level, main steam temperature, main Steam pressure, feedwater pressure, etc. are all bad points, and automatic control of the manual operation fails. After repeated restarts, the #3 controller is restored to the master state. When releasing the forced I/O point, the supervisors found that the water level of the drum drastically decreased. The on-site inspection found that the bypass water supply regulating door was in the closed state and was turned off three times manually. The drum level TV and the display meter were not monitored. To the water level, manual shutdown, shutdown.
6.2 Cause Analysis According to the historical record that can be recalled, it can be inferred that the #23 controller (auxiliary control) has lost communication with the I/O bus due to hardware failure or communication blocking before the #3 controller (master) malfunctions. When the #3 controller was offline due to a host card failure, the #23 controller was upgraded to master control, but I/O data could not be read, causing a pair of redundant controllers participating in the soda system control to fail at the same time. The water supply automatic control system lost control. Drum level protection fails. In the process of releasing the force point after the newly replaced #3 controller restarts successfully, the DCS zeros the bypass water supply regulating valve command (the logic is designed to run the unit in a safer direction in the event of controller failure), Close the bypass adjustment door. The bypass adjustment valve is an old type valve, which is equivalent to the release of the self-retaining electric door (accept pulse signal), and can not be electrically tripped when cut manually. Therefore, it cannot be opened smoothly in case of emergency, resulting in lack of steam drum. water.
6.3 Precautionary Measures 6.3.1 Replace the #3 and #23 controllers' mainboards and consider increasing the reserve of the mainboards.
6.3.2 Add communication cards to make the communication between controller and I/O card redundant.
6.3.3 Monitor the communication of all controllers, I/O cards, and BC cards, add off-line logic judgment functions, generate alarm points, and perform historical records. Once the controller is working abnormally, it can be alarmed and processed in time.
6.3.4 Increase controller over-temperature alarm function, before the controller fails, take measures to eliminate the accident in the bud.
6.3.5 The input signals of the important adjustments and protection systems of the drum water level shall generally be three independent signals. The three-way signals shall be converted into six-way signals through the shunt, and the six terminal boards and AI card parts shall be respectively adopted. Two pairs of controllers are fed, one controller is used for regulation and protection, and the other controller is only involved in protection. This can solve the problem of important protection failure when a pair of redundant controllers fails at the same time.
6.3.6 Replace the actuators of important automatic adjustment systems so that they have perfect operating functions.
6.3.7 When the DCS fails, if the main backup hard-manipulation or monitoring instrument cannot maintain normal operation, the operating personnel shall immediately stop and stop the furnace.
6.3.8 Turn off all hard disk sharing functions in the MIS system interface station to ensure that the DCS system and the MIS system only have one-way communication functions.
7. Conclusions The above cases are just a few typical cases of DCS failures occurring within a certain range. Even if all the countermeasures of these cases are applied to each set of DCS, it cannot avoid the recurrence of DCS failures. In a wider range, the number of downtime events caused by DCS failures will not be too small. Some events will certainly involve issues such as high controller load rate and high network communication load rate. Currently, there is no effective means to monitor the controller load rate. With the network communication load rate, it is still difficult to find the root cause of such incidents. Therefore, it is also difficult to eliminate such defects.
To prevent the occurrence of various types of accidents, we must start with the design and manufacture of the source-DCS, and report back to the relevant departments the failures that have occurred in various types of DCS applications in the country. The relevant departments will convene experts to carry out analysis and research and work out Corresponding standards, systems, and countermeasures will be enforced, and a large closed-loop quality control system will be formed with a long-term virtuous circle.
I. INTRODUCTION The application of DCS to domestic large-scale thermal power generating units started in the late 1980s. So far, only a dozen years of operating experience have been used. Huaneng Power International Co., Ltd. introduced the entire 350MW unit, and the Nantong, Shang'an, Dalian and Fuzhou power plants invested and constructed were the first DCS power plants in China.
With the continuous improvement of the automation level of thermal power generating units, the range of functions of the DCS system of unit units has been continuously expanded. In the past two years, the unit control room for newly-built and rebuilt units was used for emergency shutdown and standby operation. All other operations depended on DCS. Therefore, the tripping phenomenon caused by the failure of the DCS itself occurs from time to time. Therefore, how to improve the reliability of the DCS is an important topic for everyone involved in thermal automation.
II. Case 1 Controller restarts causing unit to trip 2.1 Events After November 1, 2001, the active load of Unit 4 of A Power Plant was 270 MW before shutdown. Reactive 96MVar, A and B excitation regulators were automatically operated in parallel, and manual 50Hz cabinet tracking was used. .
At 14:26, an accident sound was sent out, the generator outlet switch, the excitation switch tripped, the “regulator A tank exiting operationâ€, and the “regulator B cabinet exiting operation†and other alarm signals were issued and the unit was disconnected. Checked and tested the ECS control system and found that the #14 controller was out of order and the redundant #34 controller was restarted. After replacing the #14 and #34 controller main board, the unit was restarted. Soon, the Change group and system side by side.
2.2 Cause Analysis According to the analysis of the historical data, at 13:31, the #14 controller hardware was running offline and the hot standby #34 controller was automatically controlled by the auxiliary controller. At 14:26, the #34 controller caused a misjudgment of "WATCHDOG" due to a communication jam, causing the controller to restart. Because the controller controls the excitation regulator in a long signal mode, there is no breakpoint protection function. After the #34 controller restarts, it cannot automatically return to the state before the breakpoint, causing the A and B regulators to automatically exit the operation. The manual 50Hz cabinet is automatically Input. As the generator is demagnetized, the voltage at the generator terminal drops, causing the factory power supply voltage to drop. The manual 50Hz cabinet output voltage continues to decrease. After the manual 50Hz cabinet is put in, the generator does not escape from the demagnetization state until the excitation device is cut off and the generator is demagnetized. Protection action, the generator outlet switch trips.
The #14 controller and the #34 controller control the generator group equipment, including the standby relay BK connected to the factory power switch, #34 the controller restarts, the BK resets automatically, the relay contact opens, and the BK enters the exit position. Caused by 6KV power switch 6410, 6420 switch failed to vote.
2.3 Precautions 2.3.1 Replace the faulty controller. Later, the manufacturer confirmed that there was a problem with the crystals of the motherboards and agreed to replace the motherboards for free with the opportunity to replace all the controller boards on the 4th unit.
2.3.2 Increase the off-line alarm function of any controller, I/O card, and communication card.
2.3.3 The time of setting "WATCHDOG" in the program is too short, which may cause misjudgment and software upgrades for all controllers.
2.3.4 Regulator AQK, BQK mode switch and factory power supply self-switching BK switch configuration diagram Add the break point protection function to prevent the excitation regulator and factory power self-casting switch from running after the controller is started.
2.3.5 Check all configurations of the ECS system and modify the logic that has the above problems.
2.3.6 Contact the regulator manufacturer so that the regulator can be self-maintained in the operating state and change the way the controller controls the regulator to short pulse signal control.
2.6.7 Increase the manual 50Hz cabinet output voltage auto tracking function in the ECS.
III. Case 2 Online transmission of code caused the unit to be untied 3.1 Event On July 12, 2002, the monitoring staff of Unit #5 of B Power Plant found that the load of the unit rapidly dropped from 552MW, the main steam pressure suddenly rose, and the steam turbine was adjusted to open the door. The 20% is closed to 10% and continues to shut down, the high-profile door continues to quickly close to 0%, the unit load is reduced to 5MW, and the operator is forced to manually stop the emergency, the turbine trips, and the generator is disconnected.
3.2 Reasons Analysis DCS and turbine control systems are manufactured by two foreign companies, respectively. The two systems are quite different. Communication problems are not well resolved, and there are some defects that are difficult to eliminate. When the thermal control personnel transmits a communication code from the DCS engineer station to the PLC responsible for communication between the DCS and the turbine control system, the DCS modifies the valve position limit of the turbine from 120% during normal operation to 0.25%, causing the turbines 1, 2, and 3 to adjust the door. From 20% to 0%, the unit load quickly drops from 552 MW to 5 MW.
3.3 Precautionary Measures 3.3.1 During operation of the unit, DCS transmission of code is prohibited.
3.3.2 During the outage period of the unit, when the DCS transmits the code, it shall be agreed by the running monitor and safety measures shall be taken.
3.3.2 The function of operating the operator interface of the steam turbine control system by the DCS operator station is blocked, but the turbine control system information can still be monitored at the DCS operator station.
IV. Case III DCS Workstation Clock Chaos Causes Failure of DCS 4.1 Events After August 3, 2001, Unit C of Unit C was loaded with 200MW, #1 to #9 controllers were in control mode, and #51 to #59 controllers were in standby mode. . At 8:23, each controller sends NTP alarms in sequence. The historical station alarm window is as follows:
Aug308:23:50 drop7<7>NTP: toomanyrecvbufsallocated(30)
Aug308:23:50 drop4<7>NTP: toomanyrecvbufsallocated(30)
.........
At 8:26, the #2 controller was disconnected from the network and the #52 controller was switched off as the master; at 11:05, the #52 controller was off-net; at 13:39, the #7 controller was off-net and the #57 controller was switched off. For the main control, at the moment when #7 controller switches to #57 controller, the A and B coal mills controlled by the controller trip; at 15:11, the #9 controller goes offline and the #59 controller cuts Control, at the instant of switching from #9 controller to #59 controller, the E coal mill controlled by the controller trips; at 15:51, #1 controller is off the grid, and the #51 controller cuts the main control, in # 1 When the controller switches to the #51 controller, the A wind turbine leaf controlled by the controller is forcibly closed.
15:22, restart operator station drop213 (backup clock station), NTP alarm has not disappeared; 15:35, restart the historical station, NTP alarm has not disappeared; 15:59, restart the engineer station (master clock station), NTP The alarm disappeared; at 16:09, the historic station was restarted. At 16:30, the system returned to normal.
4.2 Cause Analysis The role of the NTP software is to maintain the unity of the network clock. The master clock is set on the engineer station and the backup clock is set on the operator station. Controller disconnection Causes the system clock to be out of sync due to the non-synchronization of the main clock and the standby clock, causing the NTP alarm to cause the controller to disconnect the network.
There are two possible reasons for NTP failure. One is a workstation with a main frequency of 400 MHz, which is different from the 270 MHz of Unit 1 (SUN has a major improvement on the operating system on a 400 MHz workstation). The workstation version 1.1 is used by Unit 2. It has not been tested on 400MHz workstations and it cannot be guaranteed that the 1.1 version of the software will not cause problems in this configuration. The other is that the master clock is not synchronized with the backup clock. After the controller was disconnected from the network on August 3, it was found that the Drop 214 clock was 2 seconds faster than the other stations. When the Drop 214 screen was called slowly, it was normal after restart. In addition, the NTP clock alarm occurred only about 73-75 days after the system operation. It is estimated that the system clock deviation has accumulated to a certain degree, causing the main and standby clocks to be out of synch, causing the system clock to be disordered, and eventually causing the controller to disconnect the network.
The failure of the NTP clock causes the controller to be disconnected from the network. Failure to handle it in time can cause the alarm controller to be disconnected from the network in turn, thus causing the entire control system to paralyze.
4.3 Precautionary Measures 4.3.1 Based on this failure phenomenon, the manufacturer upgraded the software from version 1.1 to version 1.2.
4.3.2 To ensure the reliable operation of the control system, restart the main clock and standby clock station periodically.
In the 4.4D power plant unit 5, the DCS clock was not synchronized with the GPS clock during trial operation in 2002, causing a failure of the DCS operator station. Since the data transmitted on the Internet is time-tagged, the disruption of the clock will have serious consequences for the running crew. The basic situation is similar to that of the C plant. The measure taken is to temporarily disconnect the GPS clock. After the software is upgraded and the problem is solved, the GPS clock is restored.
V. Case 4 CABLETRON HUB malfunctions due to failure of the main communication board 5.1 Events After January 1, 2002, Unit 1 of the E Power Plant has a load of 250MW. The #51 to #59 controllers are in control mode. #1 to #9 controllers In standby mode, mills A, B, C, E, and F operate. At 18:57, all pulverizers tripped (direct-fired furnaces), the MFT moved and the unit tripped.
5.2 Cause Analysis After analysis, it was confirmed that the total communication board failure of the DCS hub caused all the controllers connected to it to switch at the same time. During the process of the controller switching to the standby controller, #57, #58, and #59 controllers The PK key signal is mis-distributed (these three controllers are FSSS systems), that is, the tripping and confirmation commands of the “grinding machine trip button†on the CRT are issued at the same time, causing all coal mills to trip, causing the MFT to operate.
5.3 Precautionary measures CABLETRON hubs are early products and it is difficult to purchase spare parts in the market. CISCO hubs are used instead of CABLETRON hubs.
VI. Case 5 Failure of Redundant Controller Caused Unit Trip 6.1 Incident After March 23, 2003, the power load of Unit #3 in the F power plant was 115 MW, the main steam pressure at the furnace side was 9.55 MPa, and the main steam temperature was 537°C. The main water supply was regulated. The door opening is 43%, and the bypass water supply adjustment door opening is 47% (each water supply pipe can satisfy the 100% load water supply), and the water level of the steam drum is normal; there are no abnormal changes in other parameters.
The supervisory staff found that the parameters on the side of the boiler were abnormal, and all operations could not be carried out. At the same time, the CRT screen on the furnace side showed that all items were automatically released. The self-test screen shows that the #3 controller is offline and the #23 controller is in the master state. The operating personnel immediately contacted the thermal workers to deal with the main steam pressure and main steam temperature at the same time with the help of the turbine side CRT screen, and reinforced monitoring of the drum electric contact level gauge and water level TV. The main steam pressure fluctuates from 9.0 to 9.6 MPa and the main steam The temperature fluctuates at 510-540°C and the drum water level fluctuates at +75--50mm to maintain operation.
A few minutes later, the hot workers rushed to the scene and found that the #3 controller was offline and the #23 controller was the main control state, but the #23 controller controlled I/O points (drum water level, main steam temperature, main Steam pressure, feedwater pressure, etc. are all bad points, and automatic control of the manual operation fails. After repeated restarts, the #3 controller is restored to the master state. When releasing the forced I/O point, the supervisors found that the water level of the drum drastically decreased. The on-site inspection found that the bypass water supply regulating door was in the closed state and was turned off three times manually. The drum level TV and the display meter were not monitored. To the water level, manual shutdown, shutdown.
6.2 Cause Analysis According to the historical record that can be recalled, it can be inferred that the #23 controller (auxiliary control) has lost communication with the I/O bus due to hardware failure or communication blocking before the #3 controller (master) malfunctions. When the #3 controller was offline due to a host card failure, the #23 controller was upgraded to master control, but I/O data could not be read, causing a pair of redundant controllers participating in the soda system control to fail at the same time. The water supply automatic control system lost control. Drum level protection fails. In the process of releasing the force point after the newly replaced #3 controller restarts successfully, the DCS zeros the bypass water supply regulating valve command (the logic is designed to run the unit in a safer direction in the event of controller failure), Close the bypass adjustment door. The bypass adjustment valve is an old type valve, which is equivalent to the release of the self-retaining electric door (accept pulse signal), and can not be electrically tripped when cut manually. Therefore, it cannot be opened smoothly in case of emergency, resulting in lack of steam drum. water.
6.3 Precautionary Measures 6.3.1 Replace the #3 and #23 controllers' mainboards and consider increasing the reserve of the mainboards.
6.3.2 Add communication cards to make the communication between controller and I/O card redundant.
6.3.3 Monitor the communication of all controllers, I/O cards, and BC cards, add off-line logic judgment functions, generate alarm points, and perform historical records. Once the controller is working abnormally, it can be alarmed and processed in time.
6.3.4 Increase controller over-temperature alarm function, before the controller fails, take measures to eliminate the accident in the bud.
6.3.5 The input signals of the important adjustments and protection systems of the drum water level shall generally be three independent signals. The three-way signals shall be converted into six-way signals through the shunt, and the six terminal boards and AI card parts shall be respectively adopted. Two pairs of controllers are fed, one controller is used for regulation and protection, and the other controller is only involved in protection. This can solve the problem of important protection failure when a pair of redundant controllers fails at the same time.
6.3.6 Replace the actuators of important automatic adjustment systems so that they have perfect operating functions.
6.3.7 When the DCS fails, if the main backup hard-manipulation or monitoring instrument cannot maintain normal operation, the operating personnel shall immediately stop and stop the furnace.
6.3.8 Turn off all hard disk sharing functions in the MIS system interface station to ensure that the DCS system and the MIS system only have one-way communication functions.
7. Conclusions The above cases are just a few typical cases of DCS failures occurring within a certain range. Even if all the countermeasures of these cases are applied to each set of DCS, it cannot avoid the recurrence of DCS failures. In a wider range, the number of downtime events caused by DCS failures will not be too small. Some events will certainly involve issues such as high controller load rate and high network communication load rate. Currently, there is no effective means to monitor the controller load rate. With the network communication load rate, it is still difficult to find the root cause of such incidents. Therefore, it is also difficult to eliminate such defects.
To prevent the occurrence of various types of accidents, we must start with the design and manufacture of the source-DCS, and report back to the relevant departments the failures that have occurred in various types of DCS applications in the country. The relevant departments will convene experts to carry out analysis and research and work out Corresponding standards, systems, and countermeasures will be enforced, and a large closed-loop quality control system will be formed with a long-term virtuous circle.
O Ring,Rotating Shaft O-Ring,Perfluoroether O-Rings,Water Pipe O-Ring
Dongguan Guo Hao Seals Technology Co;Ltd , https://www.ghmfseals.com