public IList<String> getDistinctTags(IEnumerable<Article> articles)
{
return articles.SelectMany(a => a.Tags).Distinct().ToList();
}
The entire LINQ "empire" (.NET 3.5) is built on top of IEnumerable<T> which was around since .NET 2.0. Streams seem to be very artificial; why not rely on Iterable<>?
"Significantly" is a bit dramatic when we are talking about the difference between a single extra method call.
One of the benefits of the Java version is it is easier to understand if you don't have a Java background but do have an FP background. With your C# example you'd need to find the documentation to find out what SelectMany does (which is probably just a helper method that abstracts a map and flatMap call)
> "Significantly" is a bit dramatic when we are talking about the difference between a single extra method call.
It's not just a single method call; it's the entire approach. This entire method chain is something that cannot be done in Java as cleanly as it is in C#. Grouping is a prime example. Compare
If you're writing a few hundred queries in a large appliaction, then that single extra method call will start to get awfully tedious to write over and over and over again. And talk about hurting readability, too... stream()! stream() everywhere! I'd say it's pretty significant.
The main reason they build these on Stream rather than Iterable is b/c they wanted to include the `parallel()` method, which works via "spliterators" rather than plain old iterators.
In other words, in order to support a gimmick you can actually use in production in a maybe a handful of use cases, they complicated the api for the use cases you hit 99% of the time. Awesome.
actually this is not true, what you're not noticing is that the parallel() use is akin to Spark, basically these streams are just map functions and if you can put the closure onto multiple cores/machines you get much better performance without any additional programmer intelligence.
If you think that api is complicated then I don't think programming is for you, this is a very ordinary and usual construct in programming.
In the cases where your application can benefit from parallelizing simple operations over a large data set stored in a collection, `parallel()` is fine.
It's even fine in the case where you're pulling data from a file or other low-latency sequential data source, assuming that the cost of filling a spliterator buffer is less than your cost of processing.
But there's a list of gotchas all more dangerous than the "magic make it faster" button of .parallel() imply:
- For the sequential data source case, if the cost of filling the spliterator buffers is higher than the cost of processing, you're just wasting a ton of overhead trying to use parallel.
- You have to be aware that by default all uses of parallel() run on the same threadpool, which makes it a potential timebomb if someone uses it in the context of, say, a webserver where multiple requests might all individually process streams. This also means blocking operations during stream processing are very dangerous.
- Mutating an external variable goes from being fine for a sequential stream to a race condition for a parallel one.
- You can't hand out Streams that you intend to be executed sequentially, b/c your callers can just call parallel() whenever they want.
And, yes, all of these considerations make the api more complicated than one operating over plain old iterators.
Your example code is not valid Java syntax so you haven't been able to do it in Java for almost 10 years. Your example is written in Groovy, one of many alternative languages for the JVM, along with Scala, JRuby, Clojure, Beanshell (which Groovy was "inspired" by), Jython (which has been around for 18 years now, and originally called "JPython"), Gosu, and more recently Kotlin and Ceylon. Java didn't have such anonymous functions (called "lambdas" in Java, and "closures" in Groovy) until Java 8.
The extra verbosity comes from the way java 8 is trying to make functional programming more comfortable while doing as little as possible to the language to do it. When you add all the sugar to make lines more expressive, then you'd hear complaints like the ones Scala gets about having too many ways of doing the same thing.
And yes, that's the entire method definition, signature included. We could have added the return types for documentation if we felt like it.
You might still not like the style, but you can't say it's more verbose than the floor loop.
Backwards compatibility and design philosophy makes sure that Java 8 doesn't go hard enough on the sugar. It's why I am not optimistic of Java's future. There's awesome features out there in newer languages, like proper pattern matching, that Java just won't be able to borrow from functional languages.
This was exactly what I was thinking when I read the code. The goal should not be to remove loops. As a language construct there is nothing wrong with loops itself. The goal should be to make more readable and maintainable code, preferably without increasing verbosity. The functional alternatives posted here are not actually code quality improvements and should not be presented as such.
Well, it depends. I think the primary improvement is that loop is no longer a language construct. It is just another function. As a prime example I would take Clojure.
On first look: "No loops?" "This reduce functions everyhere look ugly."
On second look: "Oh, I can import paralel reduce instead of the single-threaded one?" [1] "Somebody created a library to transparently switch between local reduce and one using hadoop?" [2]
I understand what you're saying and you make good points but I still do not think removing a loop as a language feature is necessarily an interesting goal. Generally speaking this discussion tends to revolve around reducing verbosity or doing more with less (so, generalizing multiple language features into a single one). I'm just not sure if that's the most interesting discussion to have. The goal and primary focus should be clean, readable, maintainable code. If I buy a more maintainable and more easy to share codebase with a few extra lines of code or a more verbose expression of a loop that's a trade I'd always make (note; I'm not claiming that's necessarily a trade you need to make for every case in every language).
I guess this is why Microsoft diverged from the "standard" names like 'map', 'collect' and 'filter' in favour of more human-oriented SQL-like LINQ: 'select', 'where', 'toList'.
I have to read the underscore.js documentation every time I am using it :(.
MS did the right thing. A huge barrier to FP adoption is the unnecessarily obtuse terminology. In playing around with FP languages I often find myself having to re-look-up basic syntax because the terms are simply so obtuse that they won't "stick" in my head.
I think it comes from FP's rooting in mathematics, and math is itself unnecessarily arcane.
I disagree. The benefit of having the standard terms is they are the same in every FP library you use. If you spend the time to learn and internalize them within one, it'll be the same in every language or library that implements them. I don't understand how map, filter, collect, reduce are obtuse or arcane.
Those aren't the worst offenders. I'm referring to curry, cadr, lambda, monad, etc.
But map is confusing: it's also a data structure. "Wait, does map() create a new key-value dictionary?" Filter and collect are okay. Reduce is pretty arcane but tolerable, but the actual operation feels intuitively closer to "categorize" or "group."
My point about math was cultural. Math seems to revel in having its own peculiar and often even domain specific (even within mathematics!) terms and symbols for things.
And math has locally defined terms because one cannot give every concept a unique short meaningful memorable name. That is no different in computer science, where 'integer' can include negative numbers or not, may or may not wrap around, can be any number of bits or potentially unlimited, may include minus zero, etc.
Interesting enough, my academic background was mostly FP based, but my first heavy industry use of the terms was with the LINQ libraries. I have since had lots of experience with the more academic terms and can pretty easily translate back/forth with no problem. Except for filter. I routinely forget if it is "filter in" vs "filter out". It is easy enough to look up, but I never had that problem with LINQ's "where".
I'm not sure how those are supposed to be more human-oriented? I'll grant you "where" might be a slight improvement over "filter", but calling "map" "select"? I can't make sense of that even knowing what it's supposed to mean. If anything I'd assume it to be yet another synonym for filter.
I disagree. Perhaps I'm simply more use to seeing this style, but I feel it's more concise, more maintainable, and more easily built on than the loop-based version. Additionally, I find it to be more semantically clear as to what the expected behavior is, as opposed to reasoning about the loop, especially when filters become complex.
I didn't fully understand this concept until I read your post and found myself disagreeing with you, thinking "what is obstinate talking about? OBVIOUSLY the streams way is way more readable and far less verbose".
The thing is, I can't believe I'm thinking that because I was exactly in your place a few months ago. Since that time, however, I've become very familiar with functional styles and now the Java 8 streams way seems "almost, but not quite right" and the imperative iteration style seems "gratuitously complicated and philosophically wrong... I mean... look at all that special syntax! That mutation! The horror!"
All of this is to say that I'm not quite sure either of us is more correct than the other, but familiarity does seem to cause a profound mental shift.
It might be worth mentioning that I already am quite fimiliar with functional concepts and occasionally use them in my code. I find them more verbose in C++ and Java despite my familiarity with them.
I think the examples shown here don't really show how it can help- the real win for me is if you already have a function to filter over (like "article.isJava()" or something) it becomes much clearer. It also is eye opening to see or higher order functions like pluck or flatmap that build off these blocks.
Edit- there is a good flatmap example there that I glossed over when first reading. It looks like Java even has serviceable syntax for passing around lambadas like that now too, that is cool.
I don't think the article should have framed it as "let's replace traditional looping constructs," but "let's apply a filter or query to a data set." That's how I've typically seen .NET LINQ written up, and it makes more sense to me.
I had the opposite reaction after having to work with a number of complicated reductions; in a lot of cases, a short for-loop and accumulator variable would have made the code much easier to read...
Java's one of the worst language examples of using FP collections I've seen. Even with hindsight I still find this to be uglier and unnecessarily more verbose than it needs to be.
Can even be shorter without the Optional typing, but it's more readable to be explicit to have them. Dart benefits from having Collection and Stream mixins so you always get a rich API on Dart's collections.
If anyone's interested to comparing FP collections in different languages, I've ported C# 101 LINQ examples in:
- Swift https://github.com/mythz/swift-linq-examples
- Clojure https://github.com/mythz/clojure-linq-examples
- Dart https://github.com/dartist/101LinqSamples
Where makes sense if you think of it from a SQL perspective which I think has more widespread use than the corresponding FP terms (ie lots of developers don't have a CompSci background, but few of them haven't used trivial SQL).
The first example is actually a poor way to do it, IMO. Even if behind the scenes, these are implemented lazily (like C# LINQ), it may not be obvious what's going on to someone else who isn't familiar with the API. I'd opt for the for() loop each time when it's something like halting when you find what you're looking for.
The for loop is actually a poor way to do it, IMO. Even if behind the scenes, these are implemented via conditional gotos (like C), it may not be obvious what
s going on to someone else who isn't familiar with the syntax. I'd opt for the if-goto loop each time when it's something like halting when you find what you're looking for.
To wit: You can expect others to be familiar with basic features of the language and ecosystem, or prepared to learn them.
In the last example's "for" loop, a Set would probably be clearer and more efficient than a List for gathering distinct elements, at least for large data sets. I haven't tried the functional Java yet, but I wonder if using Collectors.toSet() and skipping the distinct() stage would be better?
Any thoughts on performance differences between loops and streams?
My gut says that loops, being a more primitive concept are likely to perform better in most situations. In addition I just find loops easier to reason about, but that is probably purely personal.
I have not looked at the java8 constructs surrounding this.
This is largely implementation specific. For instance, the .net LINQ to object implementations are largely syntactic sugar around loops (that is they compile to the same thing). Similarly, for loops are frequently just syntactic sugar around while loops.
Have they improved the implementation recently? I have not benchmarked myself, but according to [1], they are slower than the equivalent loop-based code.
This paper rightly points out that there will be overhead of object creation and potentially virtual calls (though that will be very JIT dependent). That said, my cursory examination (and it was very cursory) of their benchmark for C# is that they are really benchmarking the time difference between the iterator of an IEnumerable (or maybe IList, don't have an ability to decompile it right now) and the highly optimized case of looping through an array (which is literally one of the most highly optimized tasks on modern commodity hardware).
On one hand, it does prove their point that in certain very specialized cases (looping through an array with no abstraction atop it), you will have significant performance penalties in the generic iterator case.
On the other, I'm not sure I would attribute this to LINQ. I'm reasonably certain (and in these cases the space "reasonably" represents could have a truck driven through it) if you were to write the same code as a foreach loop using the same iterator and generic collections you wouldn't see significant performance differences. I'm definitely confident in most "real world" uses, where you are already using generic collections and iterators, you should bias towards using the LINQ implementation (assuming you believe it is better code) until definitive performance testing proves otherwise. For instance, in the case of the sum of squares, that looks like classic loop unrolling optimizations not being applied which any indirection in the looping code can prevent.
Further, they show that there already exist optimization libraries that can eliminate much of the overhead.
I will say, I'm quite impressed by the java results on this benchmark.
Also, I didn't write any tests to prove any of this, so could be wildly off the mark. Further, I've spent more time than i ever wanted either hand translating or writing macros to, translate high level collections code into while loops. But that was in an extremely performance sensitive environment.
It should be noted that one must assume a performance penalty when calling stream() on a Collection, as doing so creates a new Stream object. Within inner loops and for small Collections I recommend avoiding stream() altogether if performance is a concern. Furthermore, while tempting, Stream.parallel() should only be called when it is certain that the additional cost of a ForkJoinPool instance creation can be amortized over the duration of the lambda's runtime. With that said, I welcome Streams and other FP concepts to Java.
My gut feeling is that assuming streams are (in the end, but without looking) based on the loops, and Java JIT has great inlining capacity, there is really no measurable difference.
Again, one actually should measure it to conclude anything.
From my understanding the big win that is not discussed in the article is the parallelism. Stream operations can occur concurrently potentially improving performance.
EDIT: There is more in-depth discussion of this in this thread. I missed it.
The examples feel very much like Ruby to me. In a good way. This sort of chaining of operations also feels very natural for someone thinking in terms of a chain of Unix commands piped together.
However, having that explicit "stream()" signifier is a very Java-y thing to do and appears to ask the programmer to decide how best to compile the given line. I would expect the compiler should be doing that work for us.
This demonstrates a major problem with development of Java since the Collections work (which was fantastic): the libraries suffer from over-engineering and surface far too much implementation flavor in the API.
Who cares about streams? Who cares about Optional? We just want to filter a list in a clear, terse manner. (Some people do care about streams and Optional, and I wish them well, but that's orthogonal to the question at hand.)
Consider the examples given. Here they are implemented in Gosu:
(I cheated a bit on the last one by just using a Set, but that's more appropriate and communicates the uniqueness of the elements in the collection to the API consumer.)
Beyond the dot-star flatmap operator, there isn't anything very fancy going on: just closures being passed to methods, returning familiar classes that don't require additional transformation to pass on to the rest of the world.
It's too bad, because this is certainly good enough. As Jack Nicholson said: What if this... is as good as it gets?
public final class Article{
public final String title;
public final String author;
public final List<String> tags;
public Article(String title, String author, List<String> tags) {
this.title = title;
this.author = author;
this.tags = tags;
}
}
Getters don't seem very useful on an immutable object.
If you ever want to change the implementation of Article, you'd break anyone that was using that part of your API. If you use getters, you can change your implementation without breaking the consumers of your API.
For instance, let's say that you don't want to store the author's name as a string anymore, and want to store a reference to an Author object. If you have a getAuthor() method, you can change it from a simple getter to instead call author.getName(), preserving your public-facing API.
And this is one of the major reasons why Java is regarded as verbose. There's a simple solution (have the compiler transparently rewrite references to x.foo to x.getFoo / x.setFoo and transparently add getFoo/setFoo - the JVM inlines (trivial) getters and setters anyways) and yet Java, in the interests of "transparency", doesn't allow it.
And their justification fails. Sure, currently if you read x.foo you know that no code is being executed - but you never see x.foo because everything has getters and setters. So all it does in practice is make things (even) more verbose.
> If you ever want to change the implementation of Article, you'd break anyone that was using that part of your API. If you use getters, you can change your implementation without breaking the consumers of your API.
Then it wouldn't be immutable, if I can change the implementation I can also create a mutable version.
Edit example
public class Article {
private final String title;
private final String author;
private final List<String> tags;
private Article(String title, String author, List<String> tags) {
this.title = title;
this.author = author;
this.tags = tags;
}
public String getTitle() {
return title;
}
public String getAuthor() {
return author;
}
public List<String> getTags() {
return tags;
}
}
public class MutableArticle extends Article {
private String title;
private String author;
private List<String> tags;
public MutableArticle() {
super(null, null, null);
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getAuthor() {
return author;
}
public void setAuthor(String author) {
this.author = author;
}
public List<String> getTags() {
return tags;
}
public void setTags(List<String> tags) {
this.tags = tags;
}
}
Making a class immutable doesn't mean that the implementation is fixed. If you need to change your implementation to store data differently, the consumers of your API shouldn't need to be modified. The following modification to your first class would still be immutable:
public class Article {
private final Author author;
...
public String getAuthor() {
return author.getName();
}
}
Also, your first class isn't fully immutable—getTags should be implemented as follows:
/**
* @return Unmodifiable list of tags
*/
public List<String> getTags() {
return Collections.unmodifiableList(tags);
}
> Also, your first class isn't fully immutable—getTags should be implemented as follows:
True. It would be nice if Java had some immutable collection classes that don't have mutable methods. A method that gets a List<String> made with Collections.unmodifiableList(tags), doesn't know that it is actually immutable.
For some reason seeing your code really bothers me. Is it normal to ever use private variables in inheritance like that?
Say you had toString() in Article class that returned ("%s%" author, title).
Now if you create a MutableArticle on it, ma.toString() will return null unless you override toString on it as well (since Articles variables are private and not inherited).
I don't code much so I don't know is it normal to see code like that?
Also less relevant but Articles constructor should probably be public?
I agree that this design is sub-optimal. However, if a.toString() called the getters instead of using the instance variables directly, you wouldn't have problems.
The Article class is unusable as provided due to the private constructor (in fact, the MutableArticle class wouldn't even compile), but private constructors can be useful. I often use private constructors and instead provide public static methods that call the constructors. This practice is often derided, and goes into Java's perception as a verbose, arcane language, but it allows for improving an API without breaking clients built to previous versions, as well as making an API more clear (since we can give the static methods more descriptive names).
Thanks for taking time to reply. Didn't think about having toString use the getters instead of accessing the instances directly. I guess in some way that's the more proper way of doing it in the context of OOP.
I'm familiar with Java private constructors and builders/factory methods though, that was more of a minor nitpick, but again thanks for explaining on that.
So I think a fair question at this point is how often does that happen? To further inquire, how often does that happen in contexts that you don't control?
I'm sure that library/framework people need to worry about that. Most normal developers do not. For many cases Java objects like Article are just structs. They are static maps. Sure, in the example on the site there was a getTags that added some logic, but still, pretty much a struct.
Switching to either public accessor for mutable or immutable objects actually will stream line code. You might say that public mutators are bad; encapsulation and all that. For the most part little is actually gained in the majority of getter/setter code to necessitate their weight.
Heck, as per the JavaBean spec (a spec for making components to create drag and drop UIs by the way), you can't even have fluent APIs where the setter returns "this". It has to be void.
API stability has value. If you're a library, you probably want to do this just to be safe. But really for most imperative Java code it's just a lot of fluff for little value.
That's also one of the differences between Scala and Java. In Scala you have referential transparency, so you easily can switch between fields/properties and methods without changing the calling code.
Constructing a collection of things is just one case where a for loop is nice. Would be nice to see some more interesting examples of how we can use streams.
No its the price you pay, when you improperly implement generics and do not have polymorphic functions. In other words its the price you pay, if you ignore the developments in computer science in the last 10 years and invent a crippled terrible language in 1995 (roughly the same time OCaml came out) and push it onto the world with success, because you are a big corporation. At the point they introduced generics, there really was no way you should have gotten it wrong, but they did... Incidentally I think it's a failure of the Free Software movement, that they did not do much to actually move towards a modern statically typed language (or family of languages) that was not burdened by corporate policy.
In Haskell the type of map is
map :: Functor f => f a -> (a -> b) -> f b
(actually for historical reasons its called fmap, but nevermind). Here f a in Java could be something like Stream<a>, that is something that contains things of type a.
"Static typing" is not quite where those come from – mapTo{Int,Long,Double} exist because of Java's primitive/object dichotomy. You can just write map and do the same calculation, but then your lambda will have type ? -> Long (vs. the http://docs.oracle.com/javase/8/docs/api/java/util/function/... , which provides ? -> long).
It's mostly the same semantics, but it costs an extra object for each element in your list. If the only thing you're going to do is sum those longs or serialize them over the wire or something, the extra 6 characters have a significant impact on run-time. There's lots of solutions in this space (e.g. Rust is a static language that uses a lot of "zero-cost" abstractions like tagged pointers that it can prove are safe specifically because of the static types), but Java's "make the programmer do it explicitly" isn't so bad for 1995.
It's the price to pay for implementing generics via type erasure, not for getting functional features in a statically typed language. http://stackoverflow.com/a/24421331
This isn't necessarily due to erasure. This actually about boxing. Reified generics is one way to solve this. Another option is tagged pointers. That is how OCaml handles this[1].
Too much architecture for what? For bankings? For medical devices? For the safety-critical avionics software? For high-frequency trading? For manufacturing control? For weapon-systems? For power-plant command and control? For government ERP running on mainframes? Because Java is used for all of these things.
Yes, some bits might be too architected for your MVP web app that you're going to re-write in a year (and even that depends mostly on the libraries you're using; there are plenty of lean libraries for the web-startup crowd).
Oh, and no "yield" in Java.