Miri Infotech is launching a product which will configure an open source tool called as Hadoop bundle with Elasticsearch to a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques which is embedded pre-configured tool with Ubuntu 16.04 and ready-to-launch AMI on Amazon EC2 that contains Hadoop, R, RStudio, HDFS, Hbase and Shiny server. Miri Infotech brings you a first class Open Source, standalone small library called as Elasticsearch that allows Hadoop jobs to interact with Elasticsearch. It provides support for vanilla Map/Reduce, Cascading, Pig and Hive.
Some Components of Elasticsearch are:
Elasticsearch is a search engine that can index new documents in near real-time and make them immediately available for querying. It is based on Apache Lucene and allows for setting up clusters of nodes that store any number of indices in a distributed, fault-tolerant way. If a node disappears, the cluster will rebalance the (shards of) indices over the remaining nodes. You can configure how many shards make up each index and how many replicas of these shards there should be. If a master shard goes offline, one of the replicas is promoted to master and used to repopulate another node.
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into different storage destinations like Hadoop Distributed File System. It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
Kibana is an open source (Apache Licensed), browser based analytics and search interface to Logstash and other timestamped data sets stored in Elasticsearch. Kibana strives to be easy to get started with, while also being flexible and powerful