Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Accessarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2023 . Peer-reviewed
License: CC BY
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2023
Data sources: DOAJ
SSRN Electronic Journal
Article . 2025 . Peer-reviewed
Data sources: Crossref
https://dx.doi.org/10.60692/et...
Other literature type . 2023
Data sources: Datacite
https://dx.doi.org/10.60692/m6...
Other literature type . 2023
Data sources: Datacite
DBLP
Article
Data sources: DBLP
versions View all 6 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

PatCluster: A Top-Down Log Parsing Method Based on Frequent Words

PatCluster: طريقة تحليل السجل من أعلى إلى أسفل بناءً على الكلمات المتكررة
Authors: Yu Bai; Yongwei Chi; Dan Zhao;

PatCluster: A Top-Down Log Parsing Method Based on Frequent Words

Abstract

Les journaux sont une combinaison de champs de type de message statique et de champs de variables dynamiques, et la précision de l'analyse des journaux affecte le résultat des tâches d'analyse des journaux ultérieures. À cet égard, une méthode d'analyse des journaux hors ligne basée sur des mots fréquents est introduite : PatCluster. Ce procédé génère d'abord des nœuds racines par prétraitement ; deuxièmement, la fréquence des mots est comptée, et le mot avec la plus grande fréquence est extrait en tant que condition de segmentation pour affiner le modèle généré par le nœud racine. Ainsi, de manière récursive, des nœuds de modèle sont formés pour tous les éléments des nœuds, et des modèles correspondants sont générés pour finalement atteindre le but de l'exploration de modèle de journal. Le processus d'extraction des motifs de bûches va de grossier à fin, ce qui est basé sur moins d'hypothèses, et la profondeur d'ajustement des motifs peut être contrôlée en ajustant la condition de terminaison. Dans le modèle d'algorithme optimisé, nous considérons également l'étendue maximale du modèle de journal correspondant au jeton dans le message de journal. Les résultats expérimentaux montrent que cette méthode améliore efficacement la qualité de l'analyse des journaux et a une précision d'analyse des journaux plus élevée que les autres méthodes, et est plus appropriée pour la manipulation de journaux avec des structures complexes.

Los registros son una combinación de campos de tipo de mensaje estático y campos de variable dinámica, y la precisión del análisis de registros afecta el resultado de las tareas de análisis de registros posteriores. En este sentido, se introduce un método de análisis de registros sin conexión basado en palabras frecuentes: PatCluster. Este método primero genera nodos raíz mediante preprocesamiento; en segundo lugar, se cuenta la frecuencia de las palabras y se extrae la palabra con la mayor frecuencia como condición de segmentación para refinar la plantilla generada por el nodo raíz. Entonces, de forma recursiva, se forman nodos de patrones para todos los elementos de los nodos y se generan las plantillas correspondientes para finalmente lograr el propósito de la minería de patrones de registro. El proceso de extracción de los patrones de registro es de grueso a fino, lo que se basa en menos supuestos, y la profundidad de ajuste del patrón se puede controlar ajustando la condición de terminación. En el modelo de algoritmo optimizado, también consideramos la extensión máxima de la plantilla de registro que coincide con el token en el mensaje de registro. Los resultados experimentales muestran que este método mejora efectivamente la calidad del análisis de registros y tiene una mayor precisión de análisis de registros que otros métodos, y es más adecuado para el manejo de registros con estructuras complejas.

Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures.

السجلات هي مزيج من حقول نوع الرسالة الثابتة وحقول المتغيرات الديناميكية، وتؤثر دقة تحليل السجل على نتيجة مهام تحليل السجل اللاحقة. في هذا الصدد، يتم تقديم طريقة تحليل السجل دون اتصال بالإنترنت بناءً على الكلمات المتكررة: PatCluster. تقوم هذه الطريقة أولاً بإنشاء العقد الجذرية عن طريق المعالجة المسبقة ؛ ثانيًا، يتم حساب تكرار الكلمات، ويتم استخراج الكلمة ذات التردد الأكبر كشرط التجزئة لتنقيح القالب الذي تم إنشاؤه بواسطة العقدة الجذرية. لذلك بشكل متكرر، يتم تشكيل عقد الأنماط لجميع عناصر العقد، ويتم إنشاء القوالب المقابلة لتحقيق الغرض من التنقيب عن أنماط السجل في النهاية. عملية تعدين أنماط السجل من الخشنة إلى الدقيقة التي تستند إلى افتراضات أقل، ويمكن التحكم في عمق تركيب النمط عن طريق ضبط حالة الإنهاء. في نموذج الخوارزمية المحسّن، نأخذ في الاعتبار أيضًا الحد الأقصى لنموذج السجل الذي يطابق الرمز المميز في رسالة السجل. تُظهر النتائج التجريبية أن هذه الطريقة تعمل بشكل فعال على تحسين جودة تحليل السجل ولديها دقة تحليل سجل أعلى من الطرق الأخرى، وهي أكثر ملاءمة للتعامل مع السجلات ذات الهياكل المعقدة.

Related Organizations
Keywords

FOS: Computer and information sciences, Sequential Patterns, Artificial intelligence, Pattern recognition (psychology), Segmentation, Engineering, Computer security, offline algorithm, PatCluster, Pattern matching, Automated Software Testing Techniques, Statistics, FOS: Philosophy, ethics and religion, Algorithm, Frequent Patterns, frequent words, Log Analysis and System Performance Diagnosis, Physical Sciences, Matching (statistics), Electrical engineering. Electronics. Nuclear engineering, Information Systems, Text segmentation, System Logs, Computer Networks and Communications, Word (group theory), Geometry, Structural engineering, Node (physics), Binary logarithm, Mathematical analysis, Log parsing, Data Mining Techniques and Applications, Log-log plot, Root (linguistics), FOS: Mathematics, Data mining, Preprocessor, Log Analysis, Parsing, Linguistics, Computer science, TK1-9971, Process (computing), Philosophy, Operating system, Security token, Computer Science, FOS: Languages and literature, Software, Mathematics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    2
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
2
Average
Average
Average
gold