Why, when and how to return Stream from your Java API instead of a collection

Introduction

Collections are basic and commonly used data structures. Programmers from the beginning of their career learn how to use them to receive, process and return data. Getting more advanced in Java programming, they find stream() method to convert a collection into a stream and learn how to process data using some of the stream’s useful methods like map, flatMap or reduce. They could also notice that other APIs in Java return a stream too e.g. String.lines(), Matcher.results(), Files.find(), Random.ints().

If you have experience with consuming streams but haven’t produced them yet, this article is for you. I’m going to show you some scenarios where streams can be very convenient and examples on how to use them. Additionally, I shortly mention error handling and resource management.

The article is based on standard Java library java.util.stream. It’s related neither to reactive streams nor to other implementation of streams like e.g. Vavr. Also, I’m not going to cover advanced details of streams like parallel execution.

First, let’s discuss briefly distinctive streams features compared to collections. Although there are some similarities, the differences are significant and you shouldn’t treat streams as just another type of collection in the library.

Accordingly to the documentation of java.util.stream the most important features are:

No storage and Possibly unbounded — collections are ready-to-use data structures, while stream represents the ability to produce the data, which usually doesn’t even exist at the point the stream is created. As data in streams are not stored we can create practically indefinite streams or rephrasing it more practically, we can let the consumer decide how many elements to read from the stream, keeping it potentially indefinite from producer perspective (e.g. new Random().ints()).

Laziness-seeking — many operations (like filtering, mapping) are suspended at the time of stream definition and performed only when a consumer decides to consume data from a stream

Functional in nature — as you already have some experience with consuming streams, you could notice that processing data in streams you create a new stream for each step like filter or map instead of modifying source data

Consumable — you can read the stream only once, then it becomes “consumed” unlike collections which can be read many times

Let’s now see what problems we can solve with streams.

Processing a large volume of data

Assume, we have to replicate data from an external service into our database. The volume of data to replicate can be arbitrarily large. We can’t fetch all data, store it in a collection and then save in the database because of the potential risk of running out of heap memory. We have to process data in batches and design an interface between an external service client and database storage. Because the stream doesn’t store the date it can be used to safely process the required amount of data.

Example:

In the example (and all following) we are going to use static methods of java.util.stream.Stream interface to build a stream. The most powerful and flexible way to build a stream in Java is to implement the Spliterator interface and then wrap it into a stream using StreamSupport class. Though, as we can see, static factory methods in the Stream interface are sufficient in many cases.

Assume a simple API to fetch data from an external service that supports pagination (e.g. rest service, database). The API fetches at most limit items starting from the offset. Using the API iteratively we can fetch as much data as required.

Now we can use the API to provide the stream of data and isolate an API’s consumer from the pagination API:

Where Cursor is a simple holder of the current offset.

We use Stream.generate() method to build an infinite stream (7), where each element is created by the provided supplier. At this point, the stream elements are pages fetched from the REST API represented by List<T>. The instance of the Cursor class is created for each stream to track the progress of fetched elements. Stream.takeWhile() method (8) is used to detect the last page and finally to return the stream of T instead of List<T> we use flatMap to flatten the stream (9). Although in some scenarios it could be useful to preserve the batch e.g. to save the whole page in one transaction.

Now we can use the Service.stream(size, batchSize) to retrieve arbitrary long stream, without any knowledge of pagination API (we decided to expose the batchSize parameter, but it’s a design decision). The memory consumption, at any point in time, is limited by the batch size. A consumer can process the data one by one saving in the database or batch them again (with potentially different batch sizes).

Fast access to (incomplete) data

Assume we have a time-consuming operation which has to be performed on each element of data and the computation takes the time t. For n elements, the consumer has to wait for t * n before receiving the result of the computation. It could be an issue e.g. if a user is waiting for the table with the results of the computation. Preferably we would like to show the first results instantly as they are computed instead of waiting for computation of all results and filing the table at once.

Example:

Consumer:

Output:

Processing of: a

Processing of: b

…

As we can see, the result of the processing of the first element — “aa” is available for the user before the processing of the next element started, but still, the computation is the stream’s producer responsibility. In other words, the consumer decides when and if the computation should be performed, but the producer is still responsible for how to perform the computation.

You may think that’s easy and you don’t need a stream. Sure, you’re right, let’s take a look:

And the consumer:

We’ve achieved the same result but for the price of encapsulation — expensiveStringDoubler has to become public and what’s even worse, now the consumer is responsible for calling it.

But wait, we can do better:

And the consumer:

Again the same effect, but actually we have reinvented the wheel, our implementation mimics stream’s ancestor — Iterator and we’ve lost the advantage of stream’s API.

Avoid premature computation

Assume again we have a time-consuming operation to be performed on each stream element. There are situations when a consumer of the API can’t say in advance how much data is required. For example:

user canceled data loading

an error occurred during data processing and there is no need to process the rest of the data

consumer reads data until a condition is met e.g. first positive value

Thanks to the laziness of streams some computations can be avoided in such situations.

Example

Consumer:

In the example, the consumer reads the data until the value is greater than 0.4. The producer is not aware of such logic of the consumer, but it computes only as many items as necessary. The logic (e.g. the condition) can be changed independently at the consumer side.

API easy to use

There is one more reason to use streams instead of custom API design. Streams are part of the standard library and well known for many developers. Using streams in our API makes it easier for other developers to use the API.

Additional considerations

Error handling

Traditional error handling doesn’t work with Streams. Because actual processing is postponed until required

Resource management

Sometimes we have to use a resource to provide stream data (e.g. session in an external service) and we want to release it when the stream processing is finished. Fortunately, stream implements Autoclosable interface and we can use a stream in try-with-resources statements making resource management very easy. All we have to do is to register a hook in the stream with the onClose method. The hook will be automatically called when the stream is closed.

Example

Consumer:

Output:

0.2264004802916616

0.32777949557515484

Releasing resources…

Exception in thread “main” java.lang.RuntimeException: Data processing exception[Source]-https://blog.softwaremill.com/why-when-and-how-to-return-stream-from-your-java-api-instead-of-a-collection-c30e7ebc5407

We provide the best advanced java course in navi mumbai. We have industry experienced trainers and provide hands on practice. Basic to advanced modules are covered in training sessions.

Search This Blog

Digital Marketing Certfication Course

Why, when and how to return Stream from your Java API instead of a collection

Comments

Post a Comment

Popular posts from this blog

What is Kubernetes?

Best way to learn Java programming