Storm Trident示例CombinerAggregator-白红宇

Storm Trident示例CombinerAggregator

阅读量：5160 次

发布时间：2019-06-13

本文共 2674 字，大约阅读时间需要 8 分钟。

CombinerAggregator首先在每个分区上运行partitionAggregate，在每个partition内先聚合，然后运行全局重新分区(global)操作以合并同一批次的所有分区到一个单独的分区，即把前面每个partition聚合的结果，再放到一个单独的partition进行聚合。这里的网络传输与其他两个聚合器相比较少。因此，CombinerAggregator的总体性能比Aggregator和ReduceAggregator好。

省略部分代码，省略部分可参考：https://blog.csdn.net/nickta/article/details/79666918

FixedBatchSpout spout = new FixedBatchSpout(new Fields("user", "score"), 3,                    new Values("nickt1", 4),                   new Values("nickt2", 7),                    new Values("nickt3", 8),                   new Values("nickt4", 9),                    new Values("nickt5", 7),                   new Values("nickt6", 11),                   new Values("nickt7", 5)                   );           spout.setCycle(false);           TridentTopology topology = new TridentTopology();           topology.newStream("spout1", spout)                   .shuffle()                   .each(new Fields("user", "score"),new Debug("shuffle print:"))                  .parallelismHint(5)                  .aggregate(new Fields("score"), new CombinerAggregator
     
      () {                        //partition当中的每个tuple调用 1次                      public Integer init(TridentTuple tuple) {                          return tuple.getInteger(0);                      }                        //聚合结果                      //第1次调用时，val1值为zero返回的值,之后的调用为上次调用 combine的返回值                      //val2为每次init返回的值                      public Integer combine(Integer val1, Integer val2) {                          return val1+val2;                      }                        //如果partition如此没有tuple，也会调用                       public Integer zero() {                          return 0;                      }                                        }, new Fields("sum"))                  .each(new Fields("sum"),new Debug("sum print:"))                  .parallelismHint(5);

输出：

[partition0-Thread-58-b-0-executor[33 33]]> DEBUG(shuffle print:): [nickt3, 8]

[partition1-Thread-126-b-0-executor[34 34]]> DEBUG(shuffle print:): [nickt2, 7]

[partition2-Thread-60-b-0-executor[35 35]]> DEBUG(shuffle print:): [nickt1, 4]

[partition1-Thread-70-b-1-executor[39 39]]> DEBUG(sum print:): [19]

[partition4-Thread-146-b-0-executor[37 37]]> DEBUG(shuffle print:): [nickt4, 9]

[partition4-Thread-146-b-0-executor[37 37]]> DEBUG(shuffle print:): [nickt6, 11]

[partition0-Thread-58-b-0-executor[33 33]]> DEBUG(shuffle print:): [nickt5, 7]

[partition2-Thread-62-b-1-executor[40 40]]> DEBUG(sum print:): [27]

[partition0-Thread-58-b-0-executor[33 33]]> DEBUG(shuffle print:): [nickt7, 5]

[partition3-Thread-39-b-1-executor[41 41]]> DEBUG(sum print:): [5]

转载于:https://www.cnblogs.com/nickt/p/8641513.html

你可能感兴趣的文章