学术报告:海量非结构化日志的深度分析

题目:海量非结构化日志的深度分析

主讲人:李飞飞  阿里巴巴副总裁,达摩院数据库首席科学家

日期:2018年12月29日(星期六)

时间:上午10:00 - 11:00

地点:数据科学与计算机学院 A101

主持:赖韩江 副教授

 

摘要:

System event logs have been frequently used as a valuable resource in data-driven approaches to enhance system health and stability. A typical procedure in system log analytics is to first parse unstructured logs, and then apply data analysis on the resulting structured data. Previous work on parsing system event logs focused on offline, batch processing of raw log files. But increasingly, applications demand online monitoring and processing. We propose an online streaming system Spell, which utilizes a longest common subsequence based approach, to parse system event logs. We show how to dynamically extract log patterns from incoming logs and how to maintain a set of discovered message types in streaming fashion. We also utilize deep-learning based methods to automatically learn useful patterns and models from the underlying log messages. We then use these models to perform online monitoring and anomaly detection. Evaluation results on large real system logs demonstrate that even compared with the offline alternatives, our system shows its superiority in terms of both efficiency and effectiveness.

个人介绍:

李飞飞  阿里巴巴副总裁, 达摩院数据库首席科学家, 负责达摩院数据库实验室,以及阿里云数据库事业部和存储技术事业部。ACM杰出科学家。加入阿里巴巴之前是美国犹他大学计算机系的终身正教授。他的研究方向是数据库系统,大数据管理理论及系统设计开发,以及云数据管理的安全性。他获得了美国自然科学基金的Career Award, 美国惠普公司的Innovation Research Program Award,美国谷歌公司的Faculty Award,美国Visa公司的Faculty Research Award。他的研究成果获得了IEEE ICDE 2004 最佳论文奖,IEEE ICDE 2014 10年最有影响力奖,ACM SIGMOD 2015最佳系统演示奖,ACM SIGMOD 2016最佳论文奖,ACM SIGMOD 2017研究亮点奖。他是VLDB 2014和SIGMOD 2018的演示程序主席,SIGMOD 2014的大会主席,ICDE 2014,SIGMOD 2015,SIGMOD 2019的技术领域程序主席,VLDB 2019和ICDE 2019的博士论坛主席,IEEE TKDE,ACM TODS,Springer DAPD编委会成员。他也是年度SIGMOD Jim Gray最佳博士论文奖评选委员会委员。