Operator automatic assistant with Machine Learning
Our client, a world leader in security solutions, offers a wide range of solutions aimed at numerous sectors and customer segments, from small businesses to large industrial complexes.
One of its most outstanding services is the Operation Center—also known as SOC, which aims to provide avant-garde and differentiating solutions through the use of the most innovative technologies and the continuous supervision of its customers.
Among other things, the SOC is responsible for receiving and managing the fault messages on its surveillance devices. Within the process of managing such notices, there is an operator in the SOC who at a given moment has to make the decision of where to send such notification. It can be sent to a room technician who tries to solve the problem remotely, or to a field technician who solves the problem by physically moving to the damaged device.
The fact that a SOC operator makes the wrong shipping decision has a cost for our client: whether the notice is sent to a field technician when the fault could be resolved from the room or the opposite.
In this context, a case of use is proposed as a pilot project of Machine Learning with three main objectives:
- Qualify the different sources of information and analyze the quality of the data collected so far, in order to optimize the data collection process to get the most out of them
- Obtain descriptive information about the nature of the breakdowns to understand them better, which also allows us to qualify the sources of information and assess the quality of the data that is currently collected or exploit them in the future.
- Evaluate the possibility of generating a reliable predictive model, which in the future could allow to create a tool capable of assisting the SOC operator when deciding where to derive each fault, in order to improve its success ratios and therefore generate a saving for our client.
ANALYSIS OF INFORMATION AND DATA SOURCES
Of the three sources of data originally received, we conclude that at present time only the information of one of them would be relevant, since it is what the operator has available at the time of the decision making. This allowed us to greatly simplify both the data collection and the generation of the model itself.
During the analysis, it was identified that some variables had very high percentages of null values, which allowed launching processes to refine and improve data collection for these variables.
On the other hand, the analysis of the data raised the suspicion that erroneous labeling could be occurring in some cases of field incidents such as room incidents and vice versa, which hindered the subsequent learning process.
DESCRIPTIVE ANALYSIS OF COLLECTED DATA
The "top 10" of installations, panels and locations that have generated the most technical incidents were obtained. In the case of facilities and locations no relevant or representative patterns were found, but we saw that the set of incidents of the 10 panels for which we have more technical incidents represent more than 50% of total incidents. That is, if in the first iteration we had about 7,500 incidents, more than half of them were concentrated in only 10 panels. It might make sense to study this subset in detail and try to generate a model by focusing only on these 10 panels (and therefore, valid only for them), and later it will be seen that this is finally what was done.
With the final restriction of the single variables available at the time of the operator's decision, work has been done to improve the predictive model.
After several iterations testing different sets of data, sets of variables and different learning methods, on the total of data received we find that the best of the results we obtain with the current data is around 66% reliability.
However, seeing the results obtained in the descriptive analysis, we chose to work with the 'top 10' subset of panels to try to obtain more reliable models.
In order to make available to the field operators a solution that would help them in the decision making process, an interface was created that makes use of the generated model, and in which the operator enters the data of an incident and receives a prediction about what type the assessed incidence will correspond to with an associated reliability percentage.
Before starting the incident evaluation system, the operator sent approximately 50% of the incidents to the room and the other 50% to the field, and the number of incidents sent wrongly was approximately 30%. Once the system is started, the operator begins to follow its criteria when selecting where an incident is sent. The result is that the number of incidents sent incorrectly has been reduced to 10%, with the cost savings that this implies for our client.