Physical partitioning:物理分区。Flink 提供的底层 API ,允许用户定义数据的分区规则;
Task chaining and resource groups:任务链和资源组。允许用户进行任务链和资源组的细粒度的控制。
以下分别对其主要 API 进行介绍:
2. DataStream Transformations
2.1 Map [DataStream → DataStream]
对一个 DataStream 中的每个元素都执行特定的转换操作:
DataStream<Integer>integerDataStream=env.fromElements(1,2,3,4,5);integerDataStream.map((MapFunction<Integer, Object>) value -> value *2).print();// 输出 2,4,6,8,10
String string01 = "one one one two two";
String string02 = "third third third four";
DataStream<String> stringDataStream = env.fromElements(string01, string02);
stringDataStream.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out) throws Exception {
for (String s : value.split(" ")) {
out.collect(s);
}
}
}).print();
// 输出每一个独立的单词,为节省排版,这里去掉换行,后文亦同
one one one two two third third third four
env.fromElements(1, 2, 3, 4, 5).filter(x -> x > 3).print();
DataStream<Tuple2<String, Integer>> tuple2DataStream = env.fromElements(new Tuple2<>("a", 1),
new Tuple2<>("a", 2),
new Tuple2<>("b", 3),
new Tuple2<>("b", 5));
KeyedStream<Tuple2<String, Integer>, Tuple> keyedStream = tuple2DataStream.keyBy(0);
keyedStream.reduce((ReduceFunction<Tuple2<String, Integer>>) (value1, value2) ->
new Tuple2<>(value1.f0, value1.f1 + value2.f1)).print();
// 持续进行求和计算,输出:
(a,1)
(a,3)
(b,3)
(b,8)