Big Data and Cloud Computing: 2014

Friday, May 2, 2014

Compiling and Executing a java native interface code

In order to create a java native interface we need to write a class which contains native functions. Suppose SystemCheck.java is the java file containing the native functions.
Keep SystemCheck.java in the package com/tp/pc/schedule/system

For compiling
javac com/tp/pc/schedule/system/System.java

This will create a class com/tp/pc/schedule/system/System.class

Create a header File for the class
javah -d inc com.torresnetworks.policycontrol.schedule.system.System

Execute
java -Djava.library.path=/path/to/native/library -jar system.jar

Sunday, April 27, 2014

GitHub: An Open Source Developer's Tool

GitHub is a code sharing and publishing service. It is a social networking site for programmers.What is so speacial about GitHub? At the heart of GitHub is Git, an open source project started by Linus Torvalds. Git, like other version control systems, manages and stores revisions of projects. Git can control word docs and project files as well.

The difference between other version control systems live CVS and Subversion is that they are centralized but Git is distributed. In distributed version coltrol systems if you want to make changes you need to copy the whole repository to your own system. After making changes on the local copy you can check in the changes to the central system. You don’t have to connect to the server every time you make a change.

GitHub is a Git repository hosting service. Git is a command line tool but GitHub provides web based graphical user interface. In addition to that it provides access control and other features, such as a wikis and basic task management tools. Following are the three features of GitHub:
Fork: Forking is the most important feature of GitHub which means that the repository of one user can be transfered to another account. This way you can modify a repository under your account on which you don’t have write access
Pull Request:If you like to share the changes made, you can send a notification called a “pull request” to the original owner.
Merge: If the Pull Request is already made that user can then, with a click of a button, merge the changes found in your repo with the original repo.

I think this is the best approach an open source project should be executed.
If you want to contribute to an open source project then GitHub provides the best and easiest approach. Earlier we use to manually download the project’s source code, make your changes locally, create a list of changes called a “patch” and then e-mail the patch to the project’s maintainer. The maintainer would then have to evaluate this patch, possibly sent by a total stranger, and decide whether to merge the changes.

GitHub is growing where each day many repositories are forked and many more are merged. On 23 December 2013, GitHub announced that it had reached 10 million repositories. There is no hard limit on the size of repository but the guideline says that it should not exceed one gigabyte. There is a check for files larger than 100MB in a push; if any such files exist, the push will be rejected.

Tuesday, April 22, 2014

Big data : A new Buzzword

Big data is the new buzzword in the market. Big data is nothing but processing of large sets of structured and unstructured data which is generated at a very high speed. A scientific analysis of unstructured data is a business imperative for accurate forecasts, informed decision-making and an enhanced customer experience.

Big Data analytical solutions is the technology transition at a massive global scale, which is going to impact every day interactions, decisions and shape every aspect of business, governance, social interactions, education, healthcare, telecom and above all climate change and water management.The key challenges are the data privacy and security, real-time data flows, interaction with different technologies and big data in a cloud.

The world of information technology is driven by Data. IT helps to process raw data into some meaningful data. In the last more than two decades RDBMS played a vital role to handle data but the drawback of the RDBMS is that it can not scale up beyond a certain point. Secondly most of the RDBMS are not able to manage unstructured data sets like word docs, PDFs, XML, image files etc. In the recent past with the advent of smart phones , iPads and other smart devices the data (both structured as well as unstructured) generated is huge and is growing exponentially day by day. It is predicted that the volume, velocity and the variety of this data growth is endless. This has posed a challenge to the engineers to analyse/process big data with the same velocity with which it is generated taking care of the variety of data. Data management is controlling Data Volume, Velocity and Variety.
Today Big Data problems has grappled almost all the sectors including retails, airlines, automotive, financial services and energy. As per Mckinsey & Company there is a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.

Apache Hadoop is the technology which provides new ways of storing and processing massive volumes of structured and unstructured data. Hadoop is heading towards number one enterprise data storage platform in near future as it has the capability to run queries on huge data sets. the Big data technologies include MapReduce, HBase, Pig, Hive, YARN, Zoo Keeper, Sqoop, Flume and many more. It is must for the Application Architects, Solution Architects and IT Architects to delve into Big data technology and leverage it to add value to the customer experience.
Apache Spark is a latest addition which has leveraged the distributed file system HDFS, which is at the center of Hadoop Distributed computing infrastructure. Spark is using Resilient Distributed Datasets to perform in memory distributed computing capability which is swift.

Big Data and Cloud Computing