Fullstack and Machine Learning: Movie Recommender App V1

Hello readers! Lately I have been trying to figure out a way to apply all of the skills that I have learned in the past year in one project. So after a million (drastic exaggeration) tutorials and online courses later, I have finally embark this journey which I would like to call “Full-stack and Machine Learning”. I am mostly excited because this is only the beginning and these projects will only go from good to awesome to amazing. For the sake of staying on track, let us call this “Project1”. Notice however, that this project will mostly focus on the backend and machine learning aspect. So without further ado, let us see what it is that Andrew has to show this time.

This project is composed of three main parts:

  • Using machine learning to output top 50 recommended movies based on the inputted movie name.
  • Using Flask and Heroku to create an API which returns a list of 50 movie objects that are similar to the input
  • using React.js as a frontend framework to handle the fetched data and output it into a more “user friendly” app.

“Right, so what is this whole talk about machine learning Andrew”, you may be asking yourself. Well it is a simple linear algebra operation from sklearn which along with the movie_dataset are the heart of everything. The dataset file can be found here. Once we have the dataset file, we can go ahead and use python to create a function which finds similar movie to the input movie. Like this:

So first of all I am not going to take credit for this python script because I actually did not write it. I followed a tutorial from Code Heroku on YouTube which helped me understand how this worked. However, later on in the blog I will show some changes that I did to the code in order to make it work in the API. So let me go ahead and explain step by step what is going on here.

  • First we read the csv (movie_dataset) with the python pandas library in order to make the data useful. Before I forget; the dataset file must be in the same working directory as the python python file, else the program will not know where to read the data from.
  • We then go ahead and select the features from the data that will be useful to us. This is done by creating a list of words which are actually column selections which we will work with. Incase you are lost, this is what the csv file looks like:
  • Now we create a new column in the variable df which stands for data frame, in which for each row of the column we will store the keywords which correspond to each movie. You will see why this is useful in a second.
  • Okay so now that we have the piece of data that we need, we need to find the similarities between the inputted movie and the rest of the images. And the way this is done is by creating a matrix that will return the similarity percentage. Okay so that may have been very confusing but basically… Imagine if we had to find the similarity between the following strings: string1 = “France Colombia France” and string2 = “Colombia France Colombia”. We can say that string1 has a vector of (2 1) because it has 2*France and 1*Colombia. String2 however has a vector of (1 2) because it has 1*France and 2*Colombia. We then find the similarity score by finding the cosine similarity of the vectors. And the results tell us that string1 is equal to string1 100% and to string2 80%. The same applies for string2 to string2 and string2 to string1. If we were to picture this, it would look like this:
  • So in the previous point I covered step 4 and 5 of the python script. In step 6 what we do is to find the index of the movie “Avatar” in the dataset and return its index. Once we have its index, we will create a list where each element of the list is a tuple of a movie index and cosine similarity score between that movie and the input movie.
  • In step 7 we basically sort this list in descending order which helps us to output the top 50 (or whatever number you would prefer) most similar movies to the recommended movies app.
  • Step 8 simply loops through the dataset and prints the name of the movies. It was done this way for the purpose of seeing what movies the program outputs. But as I mentioned earlier, I changed the code a bit in my favor in order to output the result as a API response instead of a terminal output.

Now that we have the function ready, let us go ahead and take a look at what I built on the backend of this small app. I created an API using flask and deployed it with Heroku.com in order to later fetch the data from an external application. Here is what my flask app looks like:

First of all, let me mention in all caps, I AM NOT A FLASK PRO. I am so sorry if you cringe by looking at this, but I built this API with the sole purpose of returning data in a way that would be useful to my demo app, which by the way is only on its first version, so do not expect it to be a perfect application when you go testing it. I will however accept constructive criticism from anyone who thinks that there could have been a much better and cleaner way to do this. I am always open to advice and ready to learn. Anyways, so let me explain what is going on here.

  • So first I imported the necessary modules which I would be using. The first two modules are what make the API creation possible. It also has some methods which I used as a way to return objects when I got a GET request. The second however, is what makes the fetching of data from the frontend possible. With the flask_cors module I can tell my API which domains or external applications are allowed to fetch my data.
  • The fifth line of this code imports the function which returns the top 50 recommended movies based on the input movie name. Now I am gong to show you what changes I made to the function.
  • So the first thing that you may notice different is the name of the function. Well that was done in order to export the function and later use it in another file. You may also notice that the first function (before: get_movie_from_index(index), now: get_object_from_index(index), now returns a new object with cleaner data that would then be used by the demo app. The second function (get_index_from_name(movie)) does the same thing but I added a try and except in order to prevent the server from crashing when the movie is not found. You will notice in the demo app that when you enter a movie in lowercase, it will not return anything. This is because the function is case sensitive and will not return the recommended movies if it does not find an exact match. This is why you may notice that further down I have an if statement that returns an object with an error message if the movies are not found. Remember that this is the first version of the app, so I still have some work to do. I may write another blog in the future with a second version of the app and maybe some added features, but for now, it serves its purpose. As you read down this file you may notice another change in the function. Lastly, you may notice that in step 8, I create a list with 50 objects, which essentially are the movie objects with useful data for each movie. I then proceed to save the list in a .json file in order to visualize the data properly and see what I am working with before I deploy the app. This is what that file would typically look like:
  • Okay so now that we know what changes I made to the function, if you look back at the flask file, there are mostly three functionalities.
    1) The first is to return the object with the top 50 recommended movies when a GET request is made to <apiURL>/movies/<name of movie> (eg. https://movie-recommender-appi.herokuapp.com/movies/Avatar). This will however return an object with an error message if the movie is not found ({message: “Movie not found. Maybe the spelling is wrong :(”})
    2) The second returns an object with a message telling the user to input a movie ({“message”: “Search for a movie to find similar movies”}) when the user makes a GET request to https://movie-recommender-appi.herokuapp.com/movies/.
    3) The third returns a string saying “Hello World” whenever the API first loads in the homepage => https://movie-recommender-appi.herokuapp.com/

So for this part I will not go in depth with the code of the application. I will however show a video to demonstrate the data being used in the demo. If you want to take a look at the source code, you can go ahead an check it out on my github. So let us not waste more time and look at the video:

Alright, so I think the video was self explanatory. However you may be wondering why the the movie card does not have a picture of the movie. Well that is because the dataset does not contain a picture url for each movie and It would take a really long time for me to add a picture for each movie manually. I did however add some useful information about each movie such as name, description, genre, duration and rating.

Overall I learned quite a lot during this project and I am only excited to start a second and more awesome project. Please let me know what I can upgrade about this project in direct messages. I am looking forward to your opinions and feedback. Thank you for taking the time to read this! Also If you want to take a look at the demo app, you can do so here.

Life has a valuable lesson every day.