Problem I was generating some test data from existing dataset by using pyspark. The approach I used was: Loading existing data to a dataframe Do some random data manupulation, such as changing timestamp to random timestamp. Repeat the 2nd process 1000 times Use Union to join the dataframes together This is the code: As a result, I received the Java StackOverflowError below: Solution This error message is a very old school error message as I haven't seen it for long long time, so the first feeli…