Travel of Software Developer: 12.2013

24.12.13

http://java.dzone.com/articles/infrastructure-scale-apache

http://java.dzone.com/articles/handling-big-data-hbase-part-5

23.12.13

21.12.13

http://ac31004.blogspot.com/2013/10/installing-hadoop-2-on-mac_29.html

http://apmblog.compuware.com/2013/02/19/speeding-up-a-pighbase-mapreduce-job-by-a-factor-of-15/

http://software.intel.com/en-us/articles/hadoop-and-hbase-optimization-for-read-intensive-search-applications

https://labs.ericsson.com/blog/hbase-performance-tuners

20.12.13

http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html
http://ronxin999.blog.163.com/blog/static/422179202013328105833745/

17.12.13

Install HBase/Cloudera CDH4

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_4_4.html

yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
yum install zookeeper

===============================================================================================================================================

Package Arch Version Repository Size

===============================================================================================================================================

Installing:

zookeeper noarch 3.4.5+24-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 3.7 M

Installing for dependencies:

bigtop-utils noarch 0.6.0+186-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 8.2 k

yum install zookeeper-server

Dependencies Resolved

===============================================================================================================================================

Package Arch Version Repository Size

===============================================================================================================================================

Installing:

zookeeper-server noarch 3.4.5+24-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 4.9 k

Installing for dependencies:

foomatic x86_64 4.0.4-1.el6_1.1 rhel-cd 251 k

foomatic-db noarch 4.0-7.20091126.el6 rhel-cd 980 k

foomatic-db-filesystem noarch 4.0-7.20091126.el6 rhel-cd 4.3 k

foomatic-db-ppds noarch 4.0-7.20091126.el6 rhel-cd 19 M

pax x86_64 3.4-10.1.el6 rhel-cd 69 k

perl-CGI x86_64 3.51-127.el6 rhel-cd 207 k

perl-Test-Simple x86_64 0.92-127.el6 rhel-cd 110 k

redhat-lsb x86_64 4.0-3.el6 rhel-cd 24 k

redhat-lsb-graphics x86_64 4.0-3.el6 rhel-cd 12 k

redhat-lsb-printing x86_64 4.0-3.el6 rhel-cd 11 k

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_21_3.html

service zookeeper-server init

No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone

service zookeeper-server start

JMX enabled by default

Using config: /etc/zookeeper/conf/zoo.cfg

Starting zookeeper ... STARTED

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_3.html

yum install hadoop-conf-pseudo

===============================================================================================================================================

Package Arch Version Repository Size

===============================================================================================================================================

Installing:

hadoop-conf-pseudo x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 8.0 k

Installing for dependencies:

bigtop-jsvc x86_64 1.0.10-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 27 k

hadoop x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 17 M

hadoop-hdfs x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 12 M

hadoop-hdfs-datanode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k

hadoop-hdfs-namenode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k

hadoop-hdfs-secondarynamenode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k

hadoop-mapreduce x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 9.9 M

hadoop-mapreduce-historyserver x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k

hadoop-yarn x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 8.5 M

hadoop-yarn-nodemanager x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k

hadoop-yarn-resourcemanager x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k

nc x86_64 1.84-22.el6 rhel-cd 57 k

parquet noarch 1.2.5-1.cdh4.5.0.p0.17.el6 cloudera-cdh4 13 M

parquet-format noarch 1.0.0-1.cdh4.5.0.p0.20.el6 cloudera-cdh4 489 k

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_20_2.html

yum install hbase

Starting HBase in Standalone Mode

yum install hbase-master

Need to shutdown ZK server temporarily.

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_20_5.html

skip for the thrift server.

15.12.13

http://jinnianshilongnian.iteye.com/blog/1989330

12.12.13

HBase

http://research.google.com/archive/bigtable.html
http://blog.cloudera.com/blog/2012/06/hbase-write-path/
http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/

http://java.dzone.com/articles/how-google-does-code-review

11.12.13

http://www.programmer.com.cn/10320/

http://blog.linezing.com/2012/03/hbase-performance-optimization

http://blog.ifeng.com/article/24685221.html

http://blog.itpub.net/26686207/viewspace-746977

http://opentsdb.net/

http://blog.sina.cn/dpool/blog/s/blog_ae33b83901016azb.html

http://www.infoq.com/cn/news/2011/07/taobao-linhao-hbase

http://www.cnblogs.com/panfeng412/archive/2011/11/19/hbase-application-in-data-statistics.html

10.12.13

9.12.13

http://www.iteye.com/news/28540-5-linux-shell-commandline-website

4.12.13

What is a good software product?

We all know that we want to build good software products. But what is a good software product? Traditionally, people believe a good software product is the one that matches customer's requirements. So we spends a lot of time to collect requirements from the customer and make contracts upon it.

But really? How many customers feel bad with a software product that meet all the requirements on the paper? If that happens, some would say we didn't understand the customers' requirement good enough.

Please think again. Do the customers really know what they want before hand? They don't. So in XP, people advocate developing software together with representative from the customer side, so that the customer can feedback to the development team and help the team build software eventually, and in the way they really want. That's a good thing, because the customer will eventually realize what they want along with the grow of the product.

However, there are two issues with this kind of development model:

It is not the customers' natural duty to help the the development team, although they know they are going to use the system after it is done and hand over. But still, it is not their job by nature.
The customer could distract the development in their own way, so that it is difficult to build the system, while they miss many good features that could be build easily.

With those concerns, we have SCRUM and have PO to work with the development team. We solve the first issue because PO is responsible for building the system, but we don't necessarily solve the second one because the PO would still drive the team to the way they want, not the way that a development team is good at.

You may say, we surely want to build the system the PO or the customers want, not a system engineers like to build. Really? Do you ever hear that a good engineer is 1000 times more productive than a bad one? If we try to drag the team to the way that the engineers are not comfortable with, we are risking the productivity of the development team. Another issue, and the more real one, is that it is very difficult to build something but very easy to do another, and only the guys know the detailed technologies could answer which one is easier. Having PO drives the whole development could neglect those difference.

In my mind, software development is very detail-oriented. For example, choosing JMS or Kafka could make a huge different to the system, either for the architecture or the user experience; using HDFS and MapReduce could also make a huge difference to the system than the one using Vertica. Those knowledge is far beyond the customer or the PO could understand, even though we can explain a little. So having some top-down business requirements could be very dangerous to a development team.

You may say how could we drive the real requirements from the market? Then I would ask what is the real requirements from the market? Before the Big Data solution emerging, do we have those data mining, data analytic requirements? Yes we do, but only after Big Data is there, those requirements are become overwhelmingly important. Why? Only a requirement could be done is a requirement that is real. Otherwise, we just tracing millions of things in the world. For example, would it be a good idea to search picture with people exactly the same one in other picture in the Internet? Yes, I am sure millions of users are eager for this feature. But it is not a real requirement for a search engine because it is not practical for a search engine.

Then, back to the topic we are talking about here. What is a good software product? I would say, a good software product is the one that the customer is willing to pay for it. It may or may not meet the requirement that the customer asks, but it is definitely useful for the customer and better than other products the customer could have with other vendor so that the customer want to pay for yours.

Then how could we build such a good product. There are two parts in this definition:

How could we know the product is useful to the customer?
How could we have a solution (generally) better those from others.

For the first question, unfortunately, we could have some hints but not exactly know. Those hints are from the customers or from the POs. But we don't exactly know what would be the most useful features for the customers, because they don't know either. And as I said above, not everything the customers want could be implemented, why there could be something easy to be done and useful to the customers but they don't realize before they see it.

The best we to find out what is useful to the customer is to have them try something that possibly useful. If they like and pay you the money, you got it. If they don't like it, you change it. It looks simple but actually not easy to make it work.

To be continue....

2.12.13

http://java.dzone.com/articles/scaling-redis-and-rabbitmq

Travel of Software Developer

24.12.13

23.12.13

HBase Source Analysis

21.12.13

20.12.13

17.12.13

Install HBase/Cloudera CDH4

15.12.13

12.12.13

HBase

11.12.13

10.12.13

原创新闻【企业开源系列】Twitter：收发一条推文的背后

9.12.13

4.12.13

What is a good software product?

2.12.13

24.12.13

23.12.13

HBase Source Analysis

21.12.13

20.12.13

17.12.13

Install HBase/Cloudera CDH4

15.12.13

12.12.13

HBase

11.12.13

10.12.13

原创新闻 【企业开源系列】Twitter：收发一条推文的背后

9.12.13

4.12.13

What is a good software product?

2.12.13

原创新闻【企业开源系列】Twitter：收发一条推文的背后