Why, when and how to return Stream from your Java API instead of a collection
Introduction
Collections are basic and commonly used data structures.
Programmers from the beginning of their career learn how to use them to
receive, process and return data. Getting more advanced in Java programming,
they find stream() method to convert a collection into a stream and learn how
to process data using some of the stream’s useful methods like map, flatMap or
reduce. They could also notice that other APIs in Java return a stream too e.g.
String.lines(), Matcher.results(), Files.find(), Random.ints().
If you have experience with consuming streams but haven’t
produced them yet, this article is for you. I’m going to show you some
scenarios where streams can be very convenient and examples on how to use them.
Additionally, I shortly mention error handling and resource management.
The article is based on standard Java library
java.util.stream. It’s related neither to reactive streams nor to other
implementation of streams like e.g. Vavr. Also, I’m not going to cover advanced
details of streams like parallel execution.
First, let’s discuss briefly distinctive streams features
compared to collections. Although there are some similarities, the differences
are significant and you shouldn’t treat streams as just another type of
collection in the library.
Accordingly to the documentation of java.util.stream the
most important features are:
No storage and Possibly unbounded — collections are
ready-to-use data structures, while stream represents the ability to produce
the data, which usually doesn’t even exist at the point the stream is created.
As data in streams are not stored we can create practically indefinite streams
or rephrasing it more practically, we can let the consumer decide how many
elements to read from the stream, keeping it potentially indefinite from producer
perspective (e.g. new Random().ints()).
Laziness-seeking — many operations (like filtering, mapping)
are suspended at the time of stream definition and performed only when a
consumer decides to consume data from a stream
Functional in nature — as you already have some experience
with consuming streams, you could notice that processing data in streams you
create a new stream for each step like filter or map instead of modifying
source data
Consumable — you can read the stream only once, then it
becomes “consumed” unlike collections which can be read many times
Let’s now see what problems we can solve with streams.
Processing a large volume of data
Assume, we have to replicate data from an external service
into our database. The volume of data to replicate can be arbitrarily large. We
can’t fetch all data, store it in a collection and then save in the database
because of the potential risk of running out of heap memory. We have to process
data in batches and design an interface between an external service client and
database storage. Because the stream doesn’t store the date it can be used to
safely process the required amount of data.
Example:
In the example (and all following) we are going to use
static methods of java.util.stream.Stream interface to build a stream. The most
powerful and flexible way to build a stream in Java is to implement the
Spliterator interface and then wrap it into a stream using StreamSupport class.
Though, as we can see, static factory methods in the Stream interface are
sufficient in many cases.
Assume a simple API to fetch data from an external service
that supports pagination (e.g. rest service, database). The API fetches at most
limit items starting from the offset. Using the API iteratively we can fetch as
much data as required.
Now we can use the API to provide the stream of data and
isolate an API’s consumer from the pagination API:
Where Cursor is a simple holder of the current offset.
We use Stream.generate() method to build an infinite stream
(7), where each element is created by the provided supplier. At this point, the
stream elements are pages fetched from the REST API represented by
List<T>. The instance of the Cursor class is created for each stream to
track the progress of fetched elements. Stream.takeWhile() method (8) is used
to detect the last page and finally to return the stream of T instead of
List<T> we use flatMap to flatten the stream (9). Although in some
scenarios it could be useful to preserve the batch e.g. to save the whole page
in one transaction.
Now we can use the Service.stream(size, batchSize) to
retrieve arbitrary long stream, without any knowledge of pagination API (we
decided to expose the batchSize parameter, but it’s a design decision). The
memory consumption, at any point in time, is limited by the batch size. A
consumer can process the data one by one saving in the database or batch them
again (with potentially different batch sizes).
Fast access to (incomplete) data
Assume we have a time-consuming operation which has to be
performed on each element of data and the computation takes the time t. For n
elements, the consumer has to wait for t * n before receiving the result of the
computation. It could be an issue e.g. if a user is waiting for the table with
the results of the computation. Preferably we would like to show the first
results instantly as they are computed instead of waiting for computation of
all results and filing the table at once.
Example:
Consumer:
Output:
Processing of: a
aa
Processing of: b
…
As we can see, the result of the processing of the first
element — “aa” is available for the user before the processing of the next
element started, but still, the computation is the stream’s producer
responsibility. In other words, the consumer decides when and if the
computation should be performed, but the producer is still responsible for how
to perform the computation.
You may think that’s easy and you don’t need a stream. Sure,
you’re right, let’s take a look:
And the consumer:
We’ve achieved the same result but for the price of
encapsulation — expensiveStringDoubler has to become public and what’s even
worse, now the consumer is responsible for calling it.
But wait, we can do better:
And the consumer:
Again the same effect, but actually we have reinvented the
wheel, our implementation mimics stream’s ancestor — Iterator and we’ve lost
the advantage of stream’s API.
Avoid premature computation
Assume again we have a time-consuming operation to be
performed on each stream element. There are situations when a consumer of the
API can’t say in advance how much data is required. For example:
user canceled data loading
an error occurred during data processing and there is no
need to process the rest of the data
consumer reads data until a condition is met e.g. first
positive value
Thanks to the laziness of streams some computations can be
avoided in such situations.
Example
Consumer:
In the example, the consumer reads the data until the value
is greater than 0.4. The producer is not aware of such logic of the consumer,
but it computes only as many items as necessary. The logic (e.g. the condition)
can be changed independently at the consumer side.
API easy to use
There is one more reason to use streams instead of custom
API design. Streams are part of the standard library and well known for many
developers. Using streams in our API makes it easier for other developers to
use the API.
Additional considerations
Error handling
Traditional error handling doesn’t work with Streams.
Because actual processing is postponed until required
Resource management
Sometimes we have to use a resource to provide stream data
(e.g. session in an external service) and we want to release it when the stream
processing is finished. Fortunately, stream implements Autoclosable interface
and we can use a stream in try-with-resources statements making resource
management very easy. All we have to do is to register a hook in the stream
with the onClose method. The hook will be automatically called when the stream
is closed.
Example
Consumer:
Output:
0.2264004802916616
0.32777949557515484
Releasing resources…
Exception in thread “main” java.lang.RuntimeException: Data
processing exception[Source]-https://blog.softwaremill.com/why-when-and-how-to-return-stream-from-your-java-api-instead-of-a-collection-c30e7ebc5407
We provide the best advanced java course in navi mumbai. We have industry experienced
trainers and provide hands on practice. Basic to advanced modules are covered
in training sessions.
Comments
Post a Comment