Web Scraping Bot in Python
The web scraper will be used to extract informations about real estate listings from the Luxembourgish real estate website athome.lu.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
We start by extracting the number of available pages containing real estate listings. In general, each page consists of 20 listings. However, we do not choose to extract data from all available pages (due to running time limitation), but rather from the first 5 pages.
URL = 'https://www.athome.lu/en/srp/?tr=buy&sort=date_desc&q=faee1a4a&loc=L2-luxembourg'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# Get number of pages on website
last_page = 5 # We choose to use only the first 5 pages
#last_page = int(soup.find('a',class_='page last').text) # --> total number of pages available
As every single listing is associated to a an individual URL, we create a list containing all the URLs in question.
urls = [] # list with collected urls (each url represents a listing)
i=0 # computation progress counter
for pagenumber in range(last_page):
URL = 'https://www.athome.lu/en/srp/?tr=buy&sort=date_desc&q=faee1a4a&loc=L2-luxembourg&page=' + str(pagenumber)
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find_all('article', class_=['standard', 'silver', 'gold', 'platinum'])
for result in results:
urls.append('https://www.athome.lu' + result.find('link', itemprop='url')['href'])
# Display computation progress
i +=1
print(str(100*i/last_page) + ' %')
# Return number of collected urls
if pagenumber == (last_page-1):
print(str(len(urls)) + " URLs have been collected")
## 20.0 %
## 40.0 %
## 60.0 %
## 80.0 %
## 100.0 %
## 100 URLs have been collected
Then, we loop through the entire URL list while extracting all the available data for every listing.
Houses = pd.DataFrame()
i=0 # computation progress counter
url_non_ex = 0 # counter for non existent urls
for url in urls[:len(urls)]:
try:
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
# Find all characteristics blocks for a house
features = soup.find_all('li', class_='feature-bloc-content-specification-content')
names=[]
data=[]
# For every feature of the house find name of feature ('names') and the corresponding value ('data')
for feature in features:
names.append(feature.find('div', class_='feature-bloc-content-specification-content-name').text)
data.append(feature.find('div', class_='feature-bloc-content-specification-content-response').text)
# Add type of house and location to the feature list
names.extend(['genre', 'Lieu'])
data.append(soup.find('h1', class_='KeyInfoBlockStyle__PdpTitle-sc-1o1h56e-2 hWEtva').text.split()[0])
address = soup.find('div', class_='block-localisation-address').text
lieu = address[address.find('-')+2:]
data.append(lieu)
House = dict(zip(names, data))
Houses = Houses.append(House, ignore_index=True)
i +=1
print(str(round(100*i/len(urls),2)) + ' %')
except AttributeError:
url_non_ex +=1 # number of non-existent urls
## 1.0 %
## 2.0 %
## 3.0 %
## 4.0 %
## 5.0 %
## 6.0 %
## 7.0 %
## 8.0 %
## 9.0 %
## 10.0 %
## 11.0 %
## 12.0 %
## 13.0 %
## 14.0 %
## 15.0 %
## 16.0 %
## 17.0 %
## 18.0 %
## 19.0 %
## 20.0 %
## 21.0 %
## 22.0 %
## 23.0 %
## 24.0 %
## 25.0 %
## 26.0 %
## 27.0 %
## 28.0 %
## 29.0 %
## 30.0 %
## 31.0 %
## 32.0 %
## 33.0 %
## 34.0 %
## 35.0 %
## 36.0 %
## 37.0 %
## 38.0 %
## 39.0 %
## 40.0 %
## 41.0 %
## 42.0 %
## 43.0 %
## 44.0 %
## 45.0 %
## 46.0 %
## 47.0 %
## 48.0 %
## 49.0 %
## 50.0 %
## 51.0 %
## 52.0 %
## 53.0 %
## 54.0 %
## 55.0 %
## 56.0 %
## 57.0 %
## 58.0 %
## 59.0 %
## 60.0 %
## 61.0 %
## 62.0 %
## 63.0 %
## 64.0 %
## 65.0 %
## 66.0 %
## 67.0 %
## 68.0 %
## 69.0 %
## 70.0 %
## 71.0 %
## 72.0 %
## 73.0 %
## 74.0 %
## 75.0 %
## 76.0 %
## 77.0 %
## 78.0 %
## 79.0 %
## 80.0 %
## 81.0 %
## 82.0 %
## 83.0 %
## 84.0 %
## 85.0 %
## 86.0 %
## 87.0 %
## 88.0 %
## 89.0 %
## 90.0 %
## 91.0 %
## 92.0 %
## 93.0 %
## 94.0 %
## 95.0 %
## 96.0 %
## 97.0 %
## 98.0 %
## 99.0 %
## 100.0 %
As an example, we display the first 20 rows of the created Dataframe.
Balcony | Basement | Bathroom | Closed parking space | Energy class | Laundry | Lieu | Livable surface | Living room | Number of bedrooms | Open kitchen | Pump heating | Renovation year | Sale price | Thermal insulation class | genre | Attic | Bathooms | Fitted kitchen | Garden | Gas heating | Land | Open parking space | Separate kitchen | Terrace | Lift | Pets accepted | Restroom | Shower rooms | Year of construction | Acces for mobility-impared people | Shower room | Availability | Property’s floor | Indoor parking space(s) | Number of rooms | Renovated | Monthly charges | Fireplace | Fuel heating | Parquet | Solar panels | Convertible attic | Electric heating | Wine cellar | Converted attic |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4.87 m² | Yes | 1 | 2 | A | Yes | Wiltz | 94.21 m² | Yes | 2 | Yes | Yes | 2018 | 449,435 € | B | Apartment | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | Yes | / | 1 | G | / | Biwer | 120 m² | Yes | 4 | / | / | 2018 | 984,000 € | G | House | Yes | 2 | Yes | Yes | Yes | 3 ares | 2 | Yes | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | Yes | / | 3 | NC | Yes | Bissen | 210 m² | / | 4 | Yes | / | 2018 | 1,050,324 € | NC | House | Yes | 3 | / | / | Yes | 4.76 ares | / | / | 36.62 m² | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | 1 | 1 | B | Yes | Capellen | 204 m² | Yes | 4 | / | / | 2019 | 1,875,000 € | B | Detached | / | / | Yes | Yes | Yes | 5.09 ares | 3 | Yes | Yes | Yes | Yes | 1 | 2 | 2013 | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | 1 | 2 | H | Yes | Dudelange | 150 m² | / | 4 | / | / | 2015 | 1,035,000 € | I | House | Yes | / | / | / | Yes | 3.35 ares | 2 | / | / | / | Yes | / | / | 1954 | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | 1 | / | A | Yes | 07 rue des Romains - Strassen | 118.68 m² | Yes | 3 | / | / | 2018 | 1,691,250 € | A | Apartment | / | / | Yes | Yes | Yes | / | / | Yes | 70 m² | Yes | / | / | / | 2021 | Yes | 1 | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | / | A | Yes | 07 rue des Romains - Strassen | 57.89 m² | Yes | 1 | / | / | 2018 | 879,835 € | A | Apartment | / | / | Yes | / | Yes | / | / | Yes | 44 m² | Yes | / | 1 | / | 2021 | Yes | 1 | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | / | A | Yes | 07 rue des Romains - Strassen | 101.31 m² | Yes | 3 | / | / | / | 1,341,010 € | A | Apartment | / | / | Yes | / | Yes | / | / | Yes | 20.4 m² | Yes | / | 1 | / | 2021 | Yes | 1 | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
Yes | / | / | 1 | B | Yes | Capellen | 203 m² | / | 4 | / | / | 2018 | 1,875,000 € | B | House | / | 3 | / | 350 m² | / | 5.09 ares | 1 | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 1 | NC | / | Luxembourg | 14 m² | / | / | / | / | / | 76,500 € | NC | Indoor | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | To be agreed | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 3 | A | / | Wiltz | 264 m² | Yes | 4 | Yes | Yes | / | 799,900 € | B | House | / | 3 | / | Yes | / | 3.23 ares | / | / | Yes | / | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 3 | A | / | Wiltz | 263.83 m² | Yes | 4 | Yes | Yes | 2018 | 799,900 € | B | House | / | 3 | / | Yes | / | 3.23 ares | / | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | 1 | / | G | / | Ehlerange | 70 m² | Yes | 2 | / | / | 2018 | 490,000 € | F | Apartment | / | / | / | Yes | / | / | / | Yes | / | / | / | / | / | 1967 | / | / | À convenir | 2 | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 3 | A | / | Wiltz | 263.83 m² | Yes | 4 | / | Yes | / | 795,350 € | B | House | / | 3 | / | Yes | / | 3.11 ares | / | / | Yes | / | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
Yes | Yes | 1 | / | E | Yes | Rumelange | 101 m² | Yes | 2 | Yes | / | 2018 | 575,000 € | E | Duplex | / | / | Yes | / | Yes | / | 1 | / | / | Yes | / | 1 | / | 1995 | / | / | To be agreed | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 3 | A | / | Wiltz | 263 m² | Yes | 4 | / | Yes | 2015 | 782,900 € | B | House | / | 3 | / | Yes | / | 2.74 ares | / | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 3 | A | / | Wiltz | 263 m² | Yes | 4 | Yes | Yes | 2018 | 766,900 € | B | House | / | 3 | / | Yes | / | 2.56 ares | Yes | / | Yes | / | / | Yes | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
/ | / | / | 1 | NC | / | Luxembourg | / | / | / | / | / | 2018 | 885,000 € | NC | Detached | / | / | / | Yes | / | 3.93 ares | 2 | / | / | / | / | / | / | 1947 | / | / | immédiate | 0 | / | / | / | / | / | / | / | / | / | / | / | / |
Yes | / | / | 1 | NC | / | Bettembourg | 90 m² | Yes | 2 | / | / | 2018 | 1 € | NC | Apartment | / | / | Yes | / | / | / | / | / | / | Yes | / | / | / | Sur plan | Yes | / | To be agreed | 2 | / | / | / | / | / | / | / | / | / | / | / | / |
10 m² | Yes | 1 | 1 | E | Yes | Steinsel | 120 m² | / | 2 | / | / | 2018 | 895,000 € | NC | Duplex | / | / | Yes | Yes | Yes | / | 2 | / | Yes | / | / | Yes | / | / | / | / | To be agreed | / | 1 | 8 | Yes | / | / | / | / | / | / | / | / | / |
The collected data could then be used to carry out statistical analysis on the current real estate market in Luxembourg.