In [102]: rdd.map(lambda x: range(1, x)).collect()
Out[102]: [[1], [1, 2], [1, 2, 3]]
In [103]: rdd.flatMap(lambda x: range(1, x)).collect()
Out[103]: [1, 1, 2, 1, 2, 3]
In [104]: x = sc.parallelize([("a", ["x", "y", "z"]), ("b", ["p", "r"])])
In [106]: x.flatMapValues(lambda value:value).collect()
Out[106]: [('a', 'x'), ('a', 'y'), ('a', 'z'), ('b', 'p'), ('b', 'r')]
How to use flatMapValues on a dataset like [("a", ["x", "y", "z"], 10), ("b", ["p", "r"], 22)]?
ReplyDeleteand get result as [('a', 'x', 10), ('a', 'y', 10), ('a', 'z', 10), ('b', 'p', 22), ('b', 'r', 22)]
Delete