share/hotel_questions.ipynb

1 line
35 KiB
Plaintext
Raw Normal View History

2022-01-14 04:48:45 -05:00
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"hotel_questions.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyN2DvMFlTW8C58CuKlgqViY"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"code","execution_count":2,"metadata":{"id":"qZElOwJ35A8u","executionInfo":{"status":"ok","timestamp":1642092549357,"user_tz":-60,"elapsed":618,"user":{"displayName":"Pierre-Edouard PORTIER","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjrWdngYDIDAFn2GRDYdLSUlCwKObK25BfBfTXMLw=s64","userId":"05025412540823229047"}}},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import seaborn as sns\n","\n","sns.set_style(\"whitegrid\")"]},{"cell_type":"markdown","source":["# Contexte\n","\n","Nous considérons un jeu de données pour expérimenter sur la prédiction des annulations de réservations de chambres d'hôtel. Nous souhaitons mesurer la qualité atteignable pour les prédictions d'annulations. Nous souhaitons également découvrir les facteurs les plus discriminants qui auront permis d'automatiser cette prédiction afin de guider la mise en œuvre de contre-mesures qui permettraient de réduire les pertes de profits liées aux annulations.\n","\n","# Réflexions préliminaires\n","\n","*Quelles sont les grandes catégories de problèmes qui peuvent être résolus par des algorithmes de machine learning ? À quelle(s) catégorie(s) pourrait appartenir le problème ci-dessus ?*\n","\n","*Comment approcher ce problème ? Quelles peuvent être les premières étapes nécessaire pour débuter le travail de modélisation afin de bien formuler le problème ? Quelles erreurs éviter ?*\n"],"metadata":{"id":"8YLVbt3H1vqS"}},{"cell_type":"code","source":["df = pd.read_csv('https://git.sdf.org/p6e7p7/share/raw/branch/master/hotel_booking.csv')"],"metadata":{"id":"7WHT0w-F5NwL","executionInfo":{"status":"ok","timestamp":1642092561747,"user_tz":-60,"elapsed":10141,"user":{"displayName":"Pierre-Edouard PORTIER","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjrWdngYDIDAFn2GRDYdLSUlCwKObK25BfBfTXMLw=s64","userId":"05025412540823229047"}}},"execution_count":3,"outputs":[]},{"cell_type":"markdown","source":["# Feature engineering"],"metadata":{"id":"a7TgOmBuFwAK"}},{"cell_type":"code","source":["df.head()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":353},"id":"-iBiZZKkETMR","executionInfo":{"status":"ok","timestamp":1642092573757,"user_tz":-60,"elapsed":236,"user":{"displayName":"Pierre-Edouard PORTIER","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjrWdngYDIDAFn2GRDYdLSUlCwKObK25BfBfTXMLw=s64","userId":"05025412540823229047"}},"outputId":"20d3e627-1e36-4a45-9633-0d3b67803359"},"execution_count":4,"outputs":[{"output_type":"execute_result","data":{"text/html":["\n"," <div id=\"df-4cb66c70-03c2-4cbe-99fc-ae24d3041504\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>hotel</th>\n"," <th>is_canceled</th>\n"," <th>lead_time</th>\n"," <th>arrival_date_year</th>\n"," <th>arrival_date_month</th>\n"," <th>arrival_date_week_number</th>\n"," <th>arrival_date_day_of_month</th>\n"," <th>stays_in_weekend_nights</th>\n"," <th>stays_in_week_nights</th>\n"," <th>adults</th>\n"," <th>children</th>\n"," <th>babies</th>\n"," <th>meal</th>\n"," <th>country</th>\n"," <th>market_segment</th>\n"," <th>distribution_channel</th>\n"," <th>is_repeated_guest</th>\n"," <th>previous_cancellations</th>\n"," <th>previous_bookings_not_canceled</th>\n"," <th>reserved_room