Apache Open-Source Configurations

Installation and Configuration

Here I have installed and configured software from Apache Software Foundation.You guys could follow Cloudera/Edureka Blogs for reference.

Apache Hadoop




Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
  
Click Here 
  
For Hive and Sqoop one should install and configure Hadoop first.

Apache Hive 


Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.


Click Here

Apache Sqoop


Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.  

Click here 

Apache Cassandra



Apache Cassandra is a free open-source database system that is NoSQL based. Meaning Cassandra does not use the table model seen in MySQL, MSSQL or PostgreSQL, but instead uses a cluster model. It’s designed to handle large amounts of data and is highly scalable. We will be installing Cassandra and its pre-requisites, Oracle Java, and if necessary the Cassandra drivers.



Click Here 

For more updates on new software distributions and configuration queries

Visit:-Big Data Software


Contact us:-

Meet the Developer: 

Nithin Mohan aka 
The-Terror

Mail:nithi.mohan.97@gmail.com

Comments

  1. Hi , that is a awesome work done. I sent you an email @ mailto:nithi.mohan.97@gmail.com to suggest on the hardware part. Can you pls advise the processor , ram and motherboard that will ensure a smooth operation for practice purpose.

    ReplyDelete
    Replies
    1. Here are the recommended specifications for NameNode/JobTracker/Standby NameNode nodes. The drive count will fluctuate depending on the amount of redundancy:

      4–6 1TB hard disks in a JBOD configuration (1 for the OS, 2 for the FS image [RAID 1], 1 for Apache ZooKeeper, and 1 for Journal node)
      2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz
      64-128GB of RAM
      Bonded Gigabit Ethernet or 10Gigabit Ethernet

      Delete

Post a Comment