Problem I was generating some test data from existing dataset by using pyspark. The approach I used was: Loading existing data to a dataframe Do some random data manupulation, such as changing timestamp to random timestamp. Repeat the 2nd process 1000 times Use Union to join the dataframes together This is the code: As a result, I received the Java StackOverflowError below: Solution This error message is a very old school error message as I haven't seen it for long long time, so the first feeli…
2022-01-15
Problem I am new to the big data world, and I am trying to build a Hadoop cluster by using docker. The spark shell did not work with the error message below: Diagnose The problem looks like it can't connect to the ip address, so I start with testing the connection between spark to the ip address. The ping went through without problems. Then I look at yarn to see if I can find any logs there, I discovered the error messages below: It looks obvious that the job containers were killed because the …
Introduction I was trying out different rich text editors for web, they all good but I found CKEditor 4 is the easiest to set up. How to Implement Download Go to CKEditor Builder. Click download after you pick the plugins, skins and langauges you want. Save the CKEditor files in the server. You can delete the example folder. Insert into your html file Add a textarea with id editor in your page. Load the ckeditor.js and the config.js from the downloaded files. Use Javascript in the page to initi…
Introduction D3.js it is a popular tool for Data Visualisation, I have spent a day to explore it and built a simple mind map application. The application structure is quiet simple: Back-end: Web API by Flask in Python Front-end: D3.js As the focus is mainly on trying the D3.js, so the web api doesn't come with any security implementation. In addition, the JavaScript code is not organised very well. I might improve the code in the future if I have some spare time. Key Feature of D3.js The only…
Problem I use docker-compose to build my development Hadoop cluster, and Hive is one of the components. I received an error message: Diagnose I started hive with a debug mode by using: It returns error message: It looks like the system complained about the hostname thrift://hive-metastore.docker-hadoop_default:9083. I did some Google and the hive system does not like the character "_" Solution The hostname was auto-generated by Docker Compose, so I need to change the default name explicitly. D…