A Joint Model of Entity Linking for RFC Protocols Knowledge Graph Construction

Shoubin Li; Tao Luan

doi:10.62677/IJETAA.2401100

Authors

Li Shoubin University of Chinese Academy of Sciences and Institute of Software, Chinese Academy of Sciences Author
Luan Tao Institute of Software, Chinese Academy of Sciences Author

DOI:

https://doi.org/10.62677/IJETAA.2401100

Keywords:

Request for Comment, Entity Linking, Knowledge Graph, Protocol Analysis

Abstract

Applying knowledge representation and reasoning to downstream tasks has been considered a promising research direction, as it enables semantic analysis of network protocols. Knowledge Graph is a new way of collecting knowledge, and building a protocol knowledge graph based on RFCs can help us study and analyze network protocols more effectively. However, automatically constructing a protocol knowledge graph from RFCs poses a major challenge, particularly in terms of extracting and linking protocol entities, due to the semi-structured nature of RFC documents. In this paper, we propose a model that combines a fine-tuned language model with an RFC Domain Model to link entities in RFCs to categories in the protocol knowledge base. Firstly, we design a protocol knowledge base as the schema for protocol entity linking. Secondly, we use heuristic methods to identify protocol entities and infer their descriptions from the nearby contexts of their header fields. Finally, we conduct comprehensive experiments on the RFC dataset using our joint model and baseline methods for protocol entity linking. Experimental results demonstrate that our model achieves state-of-the-art performance in entity linking on the RFC dataset, outperforming all baseline methods. In addition, we release a protocol knowledge graph, RFC-KG¹.

Downloads

Download data is not yet available.

References

Titipat Achakulvisut, Chandra Bhagavatula, Daniel Ernesto Acuna, and Konrad P. K ̈ording. Claim extraction in biomedical publications using deep discourse model and transfer learning. ArXiv, abs/1907.00962,2019.

Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin.Rethinking complex neural network architectures for document clas-sification. In NAACL-HLT, 2019.

Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao.Active learning of regular expressions for entity extraction. IEEE Transactions on Cybernetics, 48:1067–1080, 2018.

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: A collaboratively created graph database for struc-turing human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, page 1247–1250, New York, NY, USA, 2008. Association for Computing Machinery.

Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. P4: programming protocol-independent packet processors. Computer Communication Review, 44:87–95, 2014.

Samuel Broscheit. Investigating entity knowledge in bert with simple neural end-to-end entity linking. ArXiv, abs/2003.05473, 2019.

Aditi Chaudhary, Jiateng Xie, Zaid Sheikh, Graham Neubig, and Jaime G. Carbonell. A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers. In EMNLP/IJCNLP, 2019.

Yahui Chen. Convolutional neural network for sentence classification.2015.

Kyunghyun Cho, Bart van Merrienboer, C ̧ aglar G ̈ulc ̧ehre, Dzmitry Bah-danau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. ArXiv, abs/1406.1078, 2014.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un-derstanding. ArXiv, abs/1810.04805, 2019.

Matthew Francis-Landau, Greg Durrett, and Dan Klein. Capturing semantic similarity for entity linking with convolutional neural networks. ArXiv, abs/1604.00734, 2016.

S. Garg, A. Garg, A. Kandpal, K. Joshi, R. Chauhan, and R. H. Goudar. Ontology and specification-based intrusion detection and prevention sys-tem. In Confluence 2013: The Next Generation Information Technology Summit (4th International Conference), pages 154–159, Sep. 2013.

Pablo Hernandez-Leal, Alban Maxhuni, Luis Enrique Sucar, Venet Osmani, Eduardo F. Morales, and Oscar Mayora-Ibarra. Stress modelling using transfer learning in presence of scarce data. In AmIHEALTH, 2015.

Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011.

Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. In ACL, 2018.

Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, and Cristina Nita-Rotaru. Leveraging textual specifications for grammar-based fuzzing of network protocols. In AAAI, 2019.

Assadarat Khurat and Wudhichart Sawangphol. An ontology for snort rule. 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 49–55, 2019.

Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. End-to-end neural entity linking. In CoNLL, 2018.

Maxime Labonne, Alexis Olivereau, Baptise Polv ́e, and Djamal Zegh-lache. Unsupervised protocol-based intrusion detection for real-world networks. 2020 International Conference on Computing, Networking and Communications (ICNC), pages 299–303, 2020.

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kon-tokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey,Patrick van Kleef, S ̈oren Auer, and Christian Bizer. Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6:167–195, 2015.

Ying Lin, Chin-Yew Lin, and Heng Ji. List-only entity linking. In ACL,2017.

Wei-Yin Loh. Classification and regression trees. Wiley Interdiscip. Rev.Data Min. Knowl. Discov., 1:14–23, 2011.

Mohammad Lotfollahi, Ramin Shirali Hossein Zade, Mahdi Jafari Siavoshani, and Mohammdsadegh Saberian. Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Computing, pages 1–14, 2020.

Pedro Henrique Martins, Zita Marinho, and Andr ́e F. T. Martins. Joint learning of named entity recognition and entity linking. In ACL, 2019.

George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41, November 1995.

H. Mosteghanemi and H. Drias. Bees swarm optimization for real time ontology based information retrieval. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, volume 3, pages 154–158, 2012.

Edgar Elias Osuna, Robert M. Freund, and Federico Girosi. Support vector machines: Training and applications. 1997.

Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, and Tom Michael Mitchell. Zero-shot learning with semantic output codes. In NIPS, 2009.

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christo-pher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. ArXiv, abs/1802.05365, 2018.

Yun Qu, Hao Zhang, Shijie Zhou, and Viktor K. Prasanna. Optimizing many-field packet classification on fpga, multi-core general purpose processor, and gpu. 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pages 87–98, 2015.

Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. ArXiv, cs.CL/0306050, 2003.

Lin Shi, Celia Chen, Qing Wang, Shoubin Li, and Barry W. Boehm. Un-derstanding feature requests by leveraging fuzzy method and linguistic analysis. 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 440–450, 2017.

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. How to fine-tune bert for text classification? In CCL, 2019.

Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, and Xiaolong Wang. Modeling mention, context and entity with neural networks for entity disambiguation. In IJCAI, 2015.

Hemlata Tekwani and Mahak Motwani. Text categorization comparison between simple bpnn and combinatorial method of lsi and bpnn. International Journal of Computer Applications, 97:15–21, 2014.