Apache Spark provides powerful abstractions to manage complex operations of distributed data processing as simple api calls. Moving from smaller applications to larger ones that contain multiple chained operations combining multiple datasets can leave the developer feeling lost.