24.12.13
23.12.13
HBase Source Analysis
- HRegionServer启动和停止过程分析 - dangyifei - 博客频道 - CSDN.NET
- HRegionServer 启动分析 - pwlazy的专栏 - 博客频道 - CSDN.NET
- hbase coprocessor 源码分析 - zhaokunwu的专栏 - 博客频道 - CSDN.NET
- Hbase 源码分析4 - Get 流程及rpc原理 - liuxiaochen123的专栏 - 博客频道 - CSDN.NET
- 我自己研究开源项目源代码的两个重要习惯 -- 源码分析 -- IT技术博客大学习 -- 共学习 共进步!
- HBase scan源码研究 - 愤怒的波纹的个人空间 - 开源中国社区
- HBase源码翻阅-4-HMaster与HRegionServer的RPC - 互联网
- cat4paw的博客 » HBase源码分析_RPC机制
- hbase源码分析(一):客户端数据入库 - hf200012 - ITeye技术网站
21.12.13
20.12.13
17.12.13
Install HBase/Cloudera CDH4
- yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
- yum install zookeeper
===============================================================================================================================================
Package Arch Version Repository Size
===============================================================================================================================================
Installing:
zookeeper noarch 3.4.5+24-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 3.7 M
Installing for dependencies:
bigtop-utils noarch 0.6.0+186-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 8.2 k
- yum install zookeeper-server
Dependencies Resolved
===============================================================================================================================================
Package Arch Version Repository Size
===============================================================================================================================================
Installing:
zookeeper-server noarch 3.4.5+24-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 4.9 k
Installing for dependencies:
foomatic x86_64 4.0.4-1.el6_1.1 rhel-cd 251 k
foomatic-db noarch 4.0-7.20091126.el6 rhel-cd 980 k
foomatic-db-filesystem noarch 4.0-7.20091126.el6 rhel-cd 4.3 k
foomatic-db-ppds noarch 4.0-7.20091126.el6 rhel-cd 19 M
pax x86_64 3.4-10.1.el6 rhel-cd 69 k
perl-CGI x86_64 3.51-127.el6 rhel-cd 207 k
perl-Test-Simple x86_64 0.92-127.el6 rhel-cd 110 k
redhat-lsb x86_64 4.0-3.el6 rhel-cd 24 k
redhat-lsb-graphics x86_64 4.0-3.el6 rhel-cd 12 k
redhat-lsb-printing x86_64 4.0-3.el6 rhel-cd 11 k
- service zookeeper-server init
No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone
- service zookeeper-server start
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... STARTED
- yum install hadoop-conf-pseudo
===============================================================================================================================================
Package Arch Version Repository Size
===============================================================================================================================================
Installing:
hadoop-conf-pseudo x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 8.0 k
Installing for dependencies:
bigtop-jsvc x86_64 1.0.10-1.cdh4.5.0.p0.23.el6 cloudera-cdh4 27 k
hadoop x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 17 M
hadoop-hdfs x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 12 M
hadoop-hdfs-datanode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k
hadoop-hdfs-namenode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k
hadoop-hdfs-secondarynamenode x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k
hadoop-mapreduce x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 9.9 M
hadoop-mapreduce-historyserver x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.9 k
hadoop-yarn x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 8.5 M
hadoop-yarn-nodemanager x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k
hadoop-yarn-resourcemanager x86_64 2.0.0+1518-1.cdh4.5.0.p0.24.el6 cloudera-cdh4 4.8 k
nc x86_64 1.84-22.el6 rhel-cd 57 k
parquet noarch 1.2.5-1.cdh4.5.0.p0.17.el6 cloudera-cdh4 13 M
parquet-format noarch 1.0.0-1.cdh4.5.0.p0.20.el6 cloudera-cdh4 489 k
- yum install hbase
- yum install hbase-master
- Need to shutdown ZK server temporarily.
- skip for the thrift server.
12.12.13
HBase
- http://research.google.com/archive/bigtable.html
- http://blog.cloudera.com/blog/2012/06/hbase-write-path/
- http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
11.12.13
10.12.13
4.12.13
What is a good software product?
We all know that we want to build good software products. But what is a good software product? Traditionally, people believe a good software product is the one that matches customer's requirements. So we spends a lot of time to collect requirements from the customer and make contracts upon it.
But really? How many customers feel bad with a software product that meet all the requirements on the paper? If that happens, some would say we didn't understand the customers' requirement good enough.
Please think again. Do the customers really know what they want before hand? They don't. So in XP, people advocate developing software together with representative from the customer side, so that the customer can feedback to the development team and help the team build software eventually, and in the way they really want. That's a good thing, because the customer will eventually realize what they want along with the grow of the product.
However, there are two issues with this kind of development model:
But really? How many customers feel bad with a software product that meet all the requirements on the paper? If that happens, some would say we didn't understand the customers' requirement good enough.
Please think again. Do the customers really know what they want before hand? They don't. So in XP, people advocate developing software together with representative from the customer side, so that the customer can feedback to the development team and help the team build software eventually, and in the way they really want. That's a good thing, because the customer will eventually realize what they want along with the grow of the product.
However, there are two issues with this kind of development model:
- It is not the customers' natural duty to help the the development team, although they know they are going to use the system after it is done and hand over. But still, it is not their job by nature.
- The customer could distract the development in their own way, so that it is difficult to build the system, while they miss many good features that could be build easily.
With those concerns, we have SCRUM and have PO to work with the development team. We solve the first issue because PO is responsible for building the system, but we don't necessarily solve the second one because the PO would still drive the team to the way they want, not the way that a development team is good at.
You may say, we surely want to build the system the PO or the customers want, not a system engineers like to build. Really? Do you ever hear that a good engineer is 1000 times more productive than a bad one? If we try to drag the team to the way that the engineers are not comfortable with, we are risking the productivity of the development team. Another issue, and the more real one, is that it is very difficult to build something but very easy to do another, and only the guys know the detailed technologies could answer which one is easier. Having PO drives the whole development could neglect those difference.
In my mind, software development is very detail-oriented. For example, choosing JMS or Kafka could make a huge different to the system, either for the architecture or the user experience; using HDFS and MapReduce could also make a huge difference to the system than the one using Vertica. Those knowledge is far beyond the customer or the PO could understand, even though we can explain a little. So having some top-down business requirements could be very dangerous to a development team.
You may say how could we drive the real requirements from the market? Then I would ask what is the real requirements from the market? Before the Big Data solution emerging, do we have those data mining, data analytic requirements? Yes we do, but only after Big Data is there, those requirements are become overwhelmingly important. Why? Only a requirement could be done is a requirement that is real. Otherwise, we just tracing millions of things in the world. For example, would it be a good idea to search picture with people exactly the same one in other picture in the Internet? Yes, I am sure millions of users are eager for this feature. But it is not a real requirement for a search engine because it is not practical for a search engine.
Then, back to the topic we are talking about here. What is a good software product? I would say, a good software product is the one that the customer is willing to pay for it. It may or may not meet the requirement that the customer asks, but it is definitely useful for the customer and better than other products the customer could have with other vendor so that the customer want to pay for yours.
Then how could we build such a good product. There are two parts in this definition:
- How could we know the product is useful to the customer?
- How could we have a solution (generally) better those from others.
For the first question, unfortunately, we could have some hints but not exactly know. Those hints are from the customers or from the POs. But we don't exactly know what would be the most useful features for the customers, because they don't know either. And as I said above, not everything the customers want could be implemented, why there could be something easy to be done and useful to the customers but they don't realize before they see it.
The best we to find out what is useful to the customer is to have them try something that possibly useful. If they like and pay you the money, you got it. If they don't like it, you change it. It looks simple but actually not easy to make it work.
To be continue....
Subscribe to:
Posts (Atom)