Wednesday, January 28, 2015

Splunk: Listen to your Data


Hunk™: Splunk Analytics for Hadoop


Hunk helps us to explore, analyse and visualize. It is fast compared to rest of the lot.Hunk is a next evolution in big data analytics.Hunk helps to convert data stored in Hadoop clusters into strategic assets. Deriving value and insights out of huge data stored by organisations has proven a challenge.

Hunk works on all major distributions of Apache Hadoop including those from Amazon Web Services, Cloudera, Hortonworks, IBM, MapR and Pivotal. 
Others who are giving a tough competition to splunk are Logstash ,Kibana , Graylog2 ,LogLogic, LogRhythm,Graylog2 ,Elasticsearch etc. Splunk provides a increasingly useful alternative to commercial log analysis tools. 
Although Splunk is the wonderful log analysis tool but also there are a lot of open source alternatives and competitors of Splunk. If you cannot afford the high price of Splunk, you can get some open source and free log analysis tools which provide almost same functionality of Splunk. I have listed down 20 free and open source alternatives and competitors of Splunk Log Analysis Tool. Following is the list of all these open source alternatives of Splunk.
1. Scribe - Real time log aggregation used in Facebook
Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is developed and maintained by Facebook. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
        
2. Logstash - Centralized log storage, indexing, and searching
Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use. Logstash comes with a web interface for searching and drilling into all of your logs.
      
3. Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter
Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.
      
4. Awstats - Advanced web, streaming, ftp and mail server statistics
AWStats is a powerful tool that generates advanced web, streaming, ftp or mail server statistics graphically. It can analyze log files from all major server tools like Apache log files, WebStar, IIS and a lot of other web, proxy, wap, streaming servers, mail servers and some ftp servers. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages.
      
5. nxlog - Multi platform Log management
 nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.
      
6. Graylog2 - Open Source Log Management
Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.
      
7. Fluentd - Data collector, Log Everything in JSON
Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.
      
8. Meniscus - The Python Event Logging Service
Meniscus is a Python based system for event collection, transit and processing in the large. It's primary use case is for large-scale Cloud logging, but can be used in many other scenarios including usage reporting and API tracing. Its components include Collection, Transport, Storage, Event Processing & Enhancement, Complex Event Processing, Analytics.
      
9. lucene-log4j - Log4j file rolling appender which indexes log with Lucene
lucene-log4j solves a recurrent problem that production support team face whenever a live incident happens: filtering production log statements to match a session/transaction/user ID. It works by extending Log4j's RollingFileAppender with Lucene indexing routines. Then with a LuceneLogSearchServlet, you get access to your log using web front end.
    
10. Chainsaw - log viewer and analysis tool
Chainsaw is a companion application to Log4j written by members of the Log4j development community. Chainsaw can read log files formatted in Log4j's XMLLayout, receive events from remote locations, read events from a DB, it can even work with the JDK 1.4 logging events.
11. Logsandra - log management using Cassandra
Logsandra is a log management application written in Python and using Cassandra as back-end. It is written as demo for cassandra but it is worth to take a look. It provides support to create your own parser.
    
12. Clarity - Web interface for the grep
Clarity is a Splunk like web interface for your server log files. It supports searching (using grep) as well as trailing log files in realtime. It has been written using the event based architecture based on EventMachine and so allows real-time search of very large log files.
    
13. Webalizer - fast web server log file analysis
The Webalizer is a fast web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser. It handles standard Common logfile format (CLF) server logs, several variations of the NCSA Combined logfile format, wu-ftpd/proftpd xferlog (FTP) format logs, Squid proxy server native format, and W3C Extended log formats.
    
14. Zenoss - Open Source IT Management
Zenoss Core is an open source IT monitoring product that delivers the functionality to effectively manage the configuration, health, performance of networks, servers and applications through a single, integrated software package.
    
15. OtrosLogViewer - Log parser and Viewer
OtrosLogViewer can read log files formatted in Log4j (pattern and XMLL yout), java.util.logging. Source of events can be local or remote file (ftp, sftp, sa ba, http) or sockets. It has many powerful features like filtering marking, formatting, adding notes, etc. It could also format SOAP messages in logs.
     
16. Kafka - A high-throughput distributed messaging system
Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.
    
17. Kibana - Web Interface for Logstash and ElasticSearch
Kibana is a highly scalable interface for Logstash and ElasticSearch that allows you to efficiently search, graph, analyze and otherwise make sense of a mountain of logs. Kibana will load balance against your Elasticsearch cluster. Logstash's daily rolling indicies let you scale to huge datasets, while Kibana's sequential querying gets you most relevant data quickly, with more as it becomes available.
    
18. Pylogdb - A Python-powered, column-oriented database suitable for web log analysis
pylogdb is a database suitable for web log analysis.
19. Epylog - a Syslog parser
Epylog is a syslog parser which runs periodically, looks at your logs, processes some of the entries in order to present them in a more comprehensible format, and then mails you the output. It is written specifically for large network clusters where a lot of machines (around 50 and upwards) log to the same loghost using syslog or syslog-ng.
  
20. Indihiang - IIS and Apache log analyzing tool

Indihiang Project is a web log analyzing tool. This tool analyzes IIS and Apache Web logs and generates real time reports. It has Web Log Viewer and analyzer. It is capable to analyze the trend from the logs. This tool also integrate with windows Explorer so you can attach a log file in to indihiang tool via context menu.