Network_Protocol_Entity_Extraction_Based_on_Few_shot_Learning

Zhiyuan Chang; Shoubin Li

doi:10.62677/IJETAA.2401104

Authors

Zhiyuan Chang Institute of Software, Chinese Academy of Sciences Author
Shoubin Li University of Chinese Academy of Sciences, Beijing, China and The Institute of Software, Chinese Academy of Sciences, Beijing, China Author

DOI:

https://doi.org/10.62677/IJETAA.2401104

Keywords:

Few-Shot Learning, Entity Extraction, Knowledge Graph, Network Protocol, RFC

Abstract

Knowledge Graph is a new way of knowledge collections, and building a protocol knowledge graph on RFCs can help us study and analyse network protocol better. Protocol entity extraction is one of the keys to constructing the network protocol knowledge graph.Because RFC (Request For Comments) contains detailed descriptions of basic Internet communication protocols, the protocol entities of the network can be obtained from it. However, the document format and wording are not uniform, which leads to the inability to complete the extraction task of network protocol entities based on traditional rule information extraction methods. Therefore, this paper proposes a network protocol entity extraction method based on Few-Shot Learning. This method can use a very small amount of labeled samples to extract network protocol entities from a large number of unlabeled samples and maintain high recognition accuracy. This method firstly mines as many potential network protocol entities as possible in the RFC document, and secondly performs accurate re-identification of the identified potential network protocol entities. Experiments show that using 5 manually annotated RFC documents to train our model, the accuracy of network protocol entity extraction reaches 88.4%. Compared with the existing methods, this method has higher accuracy and better robustness in terms of network protocol entity extraction, and it also has better identification ability for network protocol entities that have not appeared in the training set.

Downloads

Download data is not yet available.

References

Linda Yunlu Bai, Yongfeng Huang, Guannan Hou, and Bo Xiao. 2008. Covert channels based on jitter field of the rtcp header. In2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pages 1388–1391. IEEE.

Maria Carla Calzarossa and Luisa Massari. 2014. Analysis of header usage patterns of http request messages. In2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), pages 847–853. IEEE.

Liang Chen, Jian Gong, and Xuan Xu. 2007. A survey of application-level protocol identification algorithm. Computer science, 34(7):73–75.

Pai-Hsuen Chen, Chih-Jen Lin, and Bernhard Sch¨olkopf. 2005. A tutorial on ν-support vector machines. Applied Stochastic Models in Business and Industry, 21(2):111–136.

Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4:357–370.

Berkan Demirel, Ramazan Gokberk Cinbis, and Nazli Ikizler-Cinbis. 2019. Learning visually consistent label embeddings for zero-shot learning. In2019 IEEE International Conference on Image Processing (ICIP), pages 3656–3660. IEEE.

Mrudul Dixit, Anuja Kale, Madhavi Narote, Sneha Talwalkar, and BV Barbadekar. 2012. Fast packet classification algorithms. International Journal of Computer Theory and Engineering, 4(6):1030.

Siva Gurusamy, D Manjula, and TV Geetha. 2002. Text mining in’request for comments document series’. InLanguage Engineering Conference, 2002. Proceedings, pages 147–155. IEEE.

Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, and Cristina Nita-Rotaru. 2019. Leveraging textual specifications for grammar-based fuzzing of network protocols. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9478–9483.

Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and Philip S. Yu. 2020. A survey on knowledge graphs: Representation, acquisition and applications. ArXiv, abs/2002.00388.

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.

Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3174–3183.

Maxime Labonne, Alexis Olivereau, Baptise Polve, and Djamal Zeghlache. 2020. Unsupervised protocol-based intrusion detection for real-world networks. In2020 International Conference on Computing, Networking and Communications (ICNC), pages 299–303. IEEE.

Wei-Ming Li, Ai-Fang Zhang, Jian-Cai Liu, and ZhiTang Li. 2011. An automatic network protocol fuzz testing and vulnerability discovering method. Jisuanji Xuebao(Chinese Journal of Computers), 34(2):242–255.

Colin Lockard, Xin Luna Dong, Arash Einolghozati, and Prashant Shiralkar. 2018. Ceres: Distantly supervised relation extraction from the semi-structured web. arXiv preprint arXiv:1804.04635.

Kangqi Luo, Jinyi Lu, Kenny Q Zhu, Weiguo Gao, Jia Wei, and Meizhuo Zhang. 2019. Layout-aware information extraction from semi-structured medical images.Computers in biology and medicine, 107:235–247.

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. InProceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pages 55–60.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. arXiv preprint arXiv:1601.00770.

Masayuki Ohta, Yoshiki Kanda, Kensuke Fukuda, and Toshiharu Sugawara. 2011. Analysis of spoofed ip traffic using time-to-live and identification fields in ip headers. In2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications, pages 355–361. IEEE.

Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M Mitchell. 2009. Zero-shot learning with semantic output codes. InAdvances in neural information processing systems, pages 1410–1418.

Yun R Qu, Hao H Zhang, Shijie Zhou, and Viktor K Prasanna. 2015. Optimizing many-field packet classification on fpga, multi-core general purpose processor, and gpu. In2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pages 87–98. IEEE.

Matteo Varvello, Rafael Laufer, Feixiong Zhang, and TV Lakshman. 2015. Multilayer packet classification with graphics processing units. IEEE/ACM Transactions on Networking, 24(5):2728–2741.

Ziyu Wan, Yan Li, Min Yang, and Junge Zhang. 2019. Transductive zero-shot learning via visual center adaptation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 10059–10060.

Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys (CSUR), 53(3):1–34.

Genta Indra Winata, Onno Pepijn Kampman, and Pascale Fung. 2018. Attention-based lstm for psychological stress detection from spoken language using distant supervision. In2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6204–6208. IEEE.

Guixian Xu, Yueting Meng, Xiaoyu Qiu, Ziheng Yu, and Xu Wu. 2019. Sentiment analysis of comment texts based on bilstm. Ieee Access, 7:51522–51532.

Tianling Xu, Kaiguo Yuan, Jingzhong Wang, Xinxin Niu, and Yixian Yang. 2009. A real-time information hiding algorithm based on http protocol. In2009 IEEE International Conference on Network Infrastructure and Digital Content, pages 618–622. IEEE.

Donghuo Zeng, Chengjie Sun, Lei Lin, and Bingquan Liu. 2017. Lstm-crf for drug-named entity recognition. Entropy, 19(6):283.

Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attentionbased bidirectional long short-term memory networks for relation classification. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 207–212.