How We Built a Networking-Specific Natural Language Semantic Parser
As a member of Cisco’s Future Product Innovation team, I recently led a technical investigation focused on using natural language to configure a network management system (NMS). We wanted to help non-technical personnel carry out various types of network configurations in an intuitive and human-friendly way, using natural language (NL) in combination with speech-to-text. Our team consisted of a Cisco Fellow engineer, a senior software engineer and me, serving as the technical lead of the project. Although we had no prior knowledge or experience with natural language processing (NLP), we were highly motivated.
We focused on three main sub-domains within networking, and defined the following usage examples:
- "Please allow all voip to bldg. 10"
- "All voip to bldg 10 should be blocked"
- "Permit udp traffic from 10.0.8.1 to 184.108.40.206/12"
- "Do not allow traffic from Battle.net"
- "All FTP traffic to wireless should be denied"
- "Please apply highest priority to Diana’s ip-camera packets going to campuses"
- "Diana’s ip-camera packets outbound to campuses should be sent at highest priority"
- "Make all packets of type iso-tp4, marked with de and going towards campuses, oam"
- "John’s blue ios devices wap secure connectionless session service packets that are flagged with ef and from cellular phone should be sent at lowest latency"
- "All of Dave’s mail going to salesforce.com should go via security broker Lima"
- "Route all traffic to company.com through the DMZ interface"
- "All TCP packets on port 8888 should go through building A26"
Since no existing systems understand or carry out such requests, we knew we had to devise a new approach to make this happen. After inspecting the chosen sub-domain and assessing how real network technicians would configure a system to carry out each one of the example requests, we realized that for each sub-domain we could define a data structure breaking down the required data items for that sub-domain as fields, that we would call a "structured network request" or "SNR". We would then need to find a way to parse the request and fill in the fields so that we could build an actual NMS command to configure the system. We started thinking about how a person would go about parsing and understanding such requests. After a bit of thinking, it became clear that language knowledge alone was not enough, and that we would need to add network semantics into the parsing process.
Thus was born the "semantic parser," comprised of the following six parts:
- A deep neural network (DNN) based sub-domain classifier
- A natural language processing (NLP) engine for part of speech (POS) tagging
- Lexical lists (LLs)
- A custom trained, DNN-based named entity recognizer (NER)
- An adaptive learning knowledge base (ALKB)
- A scoring metric
Figure 1 shows what our semantic parsing flow looked like:
Figure 1: Semantic Parser Architecture
We started by passing the request through an NLP engine, which uses a pre-trained, language-specific model to analyze a given sentence, tag the words within according to their part of speech (POS), and then build a graph that shows the dependencies between the words according to their POS. The NLP engine can also identify and tag words as people, places, numbers, etc. For example, the parsing results for "All of Dave’s mail going to salesforce.com should go via security broker Lima," can be seen in figure 2.
Figure 2: NLP Engine Analysis
In order to add network semantics, we built what we call "lexical lists". An LL is a collection of key words, regular expressions, code snippets and application rules that combined, can identify and map words into SNR fields. Each LL targets a specific field. We built our LLs to be hierarchic so there would be inheritance of common sections between LLs. We soon found, however, that standard POSIX regular expressions were not enough to capture the complexity of a language. So we enhanced the regular expressions with the ability to run validation rules on capture groups, replace the content of capture groups according to predefined rules and checks, and even run snippets of code on capture groups to take advantage of all the data provided by the NLP engine and run more complicated checks and substitutions.
Validation rules are used to tell the LL engine how to evaluate the LL and allow the application of Boolean conditions on the results. By default, all the regular expressions and keyword lists are used, but are overridden by validation rules.
The LLs worked, but were not very robust since they required us to think of every possible way someone might phrase a request — which wasn’t feasible. However, they were an advantage for special and edge cases, where we found we needed to be able to define precedence rules for parsing and mappings. So we turned to DL to make our engine more robust. We set out to train our own named entity recognizer (NER), based on a deep neural network (DNN), to map a request into SNR fields.
The first problem we faced was getting training data. As any data scientist will tell you, getting good, tagged data is the most important and difficult part of building a deep learning (DL) model. Since no one has ever tried using NL for networking requests, we had no pre-existing data to use. As our entire team consisted of only three people, it would have taken us a very long time to create and tag enough data to be able to do decent training. We’re talking about millions of input samples! We could have tried to recruit other people to help, but again, we were looking for a solution to prove the feasibility of our system within a short timeframe.
Eventually, we decided to build our own data generator. We studied the typical structure of network requests in the chosen sub-domains and extracted several thousands of templates that define all possible request types. Then we created lists of words for each field in the templates, while also using data from our LLs’ key word lists. We were able to synthetically produce more than 15 quadrillion — that’s 15*10¹⁵ or 15,000,000,000,000,000 — different tagged requests. We used the generator to randomly produce 3 million sentences for each sub-domain to train our NER on.
As it turned out, this method worked very nicely for a single sub-domain and our trained NER correctly tagged many words that it had never seen during training. However, when training for multiple sub-domains, the NER got confused and couldn’t cope. We needed first to identify the specific sub-domain then run the NER using a domain-specific model.
Thus, we built a sub-domain classifier, again using a DNN. We augmented the generator to also include the sub-domain classification and generated 9 million new sentences. We used those to train the classifier and the domain-specific NER models. Our classifier was able to achieve 99% accuracy across various sets of 9 million sentences. We realized that using the classifier could also be useful before running the LLs to eliminate false positives, get better SNR mappings and reduce the runtime of our semantic parser. Finally, we put everything together by first running the classifier and then the domain-specific LLs and NER model, applying our scoring metric on the results of both.
Next, we added a phase where we weeded out invalid mappings (where some fields were missing or doubly mapped, etc.) and filled in default values for empty fields such as "all" when no source/destination were specified and so on. Lastly, an evaluation step was carried out where we used our scoring metric to choose the "best" SNR mapping for the given query.
It should be noted, that the semantic parser itself was built to be generic. We used sub-domain specific LLs as well as generator templates for the classifier and NER training (all in the form of JSON files) to teach it the chosen networking sub-domains.
Where Does This Fit in the Grand Scheme of Things?
At this point, it was time to start thinking about configuring an NMS according to the chosen SNR mapping. To incorporate that, we widened our perspective and came up with a high-level design as shown in figure 3.
Figure 3: High Level Design
We would start by parsing the NL request while applying network semantics, extracting the various bits of information, and mapping them into an SNR. To make sure we got everything right, we would show the user what we understood and get their approval.
Next, we pass the SNR through a verification mechanism to confirm we had filled in all the required fields and to translate the human understandable terms to machine terms (such as addresses, ports, etc.) Once everything is ready, we pass the SNR through an NMS specific module that builds the actual network request according to the supplied fields. In our case, we used an NMS that was configured via a REST API. In a bit more detail, figure 4 shows the actual flow we built.
Figure 4: SNR Processing Flow
The user speaks their request, and the speech-to-text mechanism converts it to text. The request passes through the semantic parser to give us the set of SNR candidates. Just as humans might understand the same sentence in several different ways so might our parser, and we need to examine each option to find the best one. We pass the candidates through a resolver that eliminates bad candidates according to predefined rules on required fields or sets of fields for the specific sub-domain, and fills in default values for empty fields. The candidates are ranked using our custom scoring metric and passed through an evaluator that chooses the best match according to the given scores. Once we have our "best" SNR in hand, we pass it through a standardizer to translate all the data into machine/network terms.
At this point, we might have several SNRs since some requests may translate into several network operations such as setting both directions of a network flow, handling both TCP and UDP requests, handling multiple port/address ranges etc. We then pass those SNRs through a dedicated NMS translator to convert everything to what the specific NMS expects to get, and then use an NMS specific API runner to build the corresponding REST API calls and carry them out.
This all looked good, but when trying some new real-world examples, we found that we were still missing some knowledge. The system didn’t know anything outside of the language and semantics we incorporated into it, but humans know more and have a context in which they make their requests. For example, a human technician would know what "the company site in San Jose" is, but the machine didn’t. How could we add such capabilities to our system?
Our solution was two-fold. Firstly, we decided to harvest the chosen NMS for data it already had regarding networks, VPNs, VLANS, hosts etc. Secondly, we added an adaptive learning knowledge base (ALKB). The first time the system encounters something it can’t translate into network/NMS terms, we ask the user to do this for us. We add the knowledge into our ALKB so that we can retrieve it the next time we encounter the same term. We use an iterative process in which we review all the unknowns and let the user fill them in. After a couple of iterations, we have the full mapping and translation and can build the REST calls used to configure the system.
Making Things Pretty
To make things nice and presentable, we added a simple Web UI where the user can simply speak to the machine and make their request (figures 5–6). The system presents the initial mapping and then gives the user the ability to fill in missing data, either by choosing from a list of options taken from the ALKB or NM, or by typing/speaking their own answers (figures 7–10). Finally, we configure the NMS and give the user a link to the NMS screen where the results of the configuration can be inspected (figures 11–13). The UI also keeps track of all the requests and configurations for debugging and accountability.
Figure 5: Front page where the user can either type in or speak their request
Figure 6: The system is running the request through the semantic parser
Figure 7: The request has been parsed and the SNR mapping is shown after translation to NMS terms. Still some information is missing so the user is asked to make a choice.
Figure 8: The user makes a choice
Figure 9: Now the SNR is fully resolved
Figure 10: We can also see the list of API calls that will be carried out to fulfill the request — In this case we need a single call
Figure 11: The NMS is being configured…
Figure 12: The request was successfully carried out!
Figure 13: This is the NMS dashboard showing the new configuration that was added to fulfill the request
So How Good Were Our Results?
We needed a way to measure our success. We already had our generator that gave us gold-standard data, so all we needed to do was compare that to what our semantic parser did. We created more than 50,000 sample sentences, passed them through the system, and counted and averaged the good, bad, and missing mappings. As it turned out, our mapping accuracy was roughly 94% — a very high figure in the NLP world.
All in all, we spent two to three months in tech investigations to learn the field of NLP and the existing tech as well as to meet with various groups w
Within and outside our company to gather their input and wisdom. We then spent another month dissecting the problem space and planning our moves. We chose to use Python 3.6 as our language and spent another six months on actual coding and debugging to get the final result.
We published our efforts as a defensive publication.
We had some thoughts on future development such as adding an initial configuration step where user/company/entity specific knowledge could be fed into the system to avoid having to ask the users many questions during the initial running period of the system. We also wanted to integrate with some user/employee management systems that would be able to give us information about the registered devices a user owns and is currently using and add background trackers that would continuously update the NMS configurations as things change to make sure the network requests are always carried out in a dynamic environment. However, since this was a technical investigation that was picked up as a project for further development by a different R&D group, we didn’t have the opportunity to pursue those ideas.