Monday, 30 June 2014

10 Awesome Tools (Open Source) For Your Big Data Needs

Open Source tools, Big Data, Gluster, Lucene, Sqoop, Chukwa, Terracotta, Rattle, ZooKeeper, Oozie, Apache Flume, KNIMEBig Data as well as Open Source go hand in hand. We live in an age where both the aforementioned technologies are being increasingly embraced by one and all. While Big Data holds the distinction of emerging as one of the fastest growing technologies around, the Open Source technology is meanwhile pulling the rug from underneath closed software. Here are 10 awesome Open Source tools for your Big Data needs.



1.Gluster

GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace.

2.Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

3.Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

4.Chukwa

Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data.

5.Terracotta

BigMemory Max lets you create a distributed in-memory data store made up of pairs of servers, or stripes. Each stripe consists of an active server and a mirror server, which stores a copy of the active. If the active goes offline, the mirror takes over, ensuring high availability.

6.Rattle

Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

7.ZooKeeper

ZooKeeper is a centralised service for maintaining configuration information, naming, providing distributed synchronisation, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

8.Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

9.Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS.

10.KNIME

KNIME is a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open integration platform provides over 1000 modules (nodes), including those of the KNIME community and its extensive partner network.

No comments:

Post a Comment

Labels

Tutorial (129) Tech News (83) E-Books (55) Pdf (47) Hacking (46) Linux (32) Android (23) Programming (22) Tools (22) Video (21) Ethical Hacking (16) Electronics (12) Google (10) Hacked (9) Python (9) Facebook (8) Java (8) Software (8) PHP (7) Android App (6) C (6) Free Online Coureses (6) OpenSource (6) Ubuntu (6) Unix (6) Windows (6) C++ (5) Game Programming (5) Java Programming (5) Kali Linux (5) CodeKill (4) Cryptography (4) Firefox (4) JavaScript (4) Linux System Administrator (4) Mac (4) Penetration testing (4) Python Programming (4) Security (4) Top Distros (4) WhatsApp (4) CSS (3) Circuit (3) Cloud Computing (3) Game Devlopment (3) Hacking Tools (3) Malware (3) MicroController (3) Microsoft (3) Networking Tool (3) Perl (3) Source Code (3) WebSite (3) Windows 8.1 (3) C Programming (2) C Series (2) C# (2) CheetSheet (2) Computer (2) Computer Networking (2) Data Storage (2) Dual Boot (2) Eclipse (2) Edward Snowden (2) Exploit (2) Facts (2) Games On Linux (2) Google Chrome (2) HTML5 (2) Hacking Challenges (2) IDE's (2) Information Security (2) Lenovo (2) Linux Kernel (2) Malicious (2) Mobile (2) Motorola (2) Mozilla (2) MySQL (2) NoSQL (2) Raspberry Pi (2) Ruby (2) Security Tools (2) Syrian Electronic Army (2) Tricks and Tips (2) Valentine Day (2) Web Design (2) iOS (2) iPhone (2) jQwery (2) *nix (1) 2014 (1) 3D Modeling (1) Algorithm (1) Android Hacking (1) Android Pattern Lock Screen.. (1) Anonymous Mail (1) Anti-Spam (1) Apps (1) Arduino (1) Artificial Intelligance (1) Audio Software (1) BSD (1) BeAWARE (1) Bitcoin (1) Black Hat Hackers (1) BlackBerry (1) Buffer Overflow (1) C++ vs Java (1) CISO (1) Circuit Analysis (1) Circuit Design (1) Circuit Programming (1) Circuit Simulators (1) Codes (1) Crptology (1) Cryptanalysis (1) DDOS (1) Devlopers (1) Drupal (1) DuckDuckGo Search Engine (1) E-Card (1) E-Mails (1) Embedded System (1) Encryption Tools (1) Error (1) FTP (1) Famous Passwords (1) FileZilla (1) Flipkart (1) Forbes (1) Forgot Password (1) GCHQ (1) Genders (1) Gmail (1) Google Tricks and Trips (1) HTML (1) Hacking Distro (1) Hard Disk (1) Hash Encryption (1) Illegal (1) Internet (1) LAMP (1) Language Theory (1) LibreOffice (1) Linus Trovalds (1) Logic Gates (1) MATLAB (1) MOSFET (1) Mail (1) Mark Zuckerberg (1) Mathematical (1) MicroProcessor (1) Mind Mapping Tools (1) Myntra (1) NoSQL Database (1) Nobal Prize (1) Nokia (1) Object Oriented Programming (1) Office (1) Oldboot (1) Online (1) Paranoid Android (1) Passwords (1) Passwords Cracking Tools (1) PayPal (1) Perl Programming (1) Plugins (1) Prolog Programming (1) Python Basics (1) Remote (1) SEA (1) SQL Injection (1) Sans (1) Screencasts (1) Screenloggers (1) Server Load (1) Servers (1) Shell (1) Software Design (1) Software Developer (1) Software Testing (1) Sony (1) Spider.io (1) Statistical (1) Steve Jobs (1) TCP/IP (1) Timeline (1) Tor (1) Trojan (1) Ubuntu Phones (1) VAIO (1) Virus (1) Web Designers (1) Wi-Fi Hacking (1) Windows Tools (1) Windows XP (1) WordPress (1) XML (1) Yahoo (1) YouTube (1) cpp (1) eBay (1) iBanking (1)