Previous Posts

Security Check and Information Protection?

Did you know that many companies and websites currently employ specialized strategies to protect their data from exploitation?

For example, a company involved in classified ads conceals users’ phone numbers from its site except through manual actions. However, our system used in clients.bio easily extracted all the information.

The reason behind this is our design of a tool capable of capturing all movements and requests within any web page. This tool forms the foundation of clients.bio, focusing on collecting information like phone numbers from pages and requests.

We’ve created other examples available as open source on GitHub, such as “Souq Scrape,” a tool to gather products from sites like Souq or Amazon. These tools mark the beginning in the realm of data mining and information extraction from various datasets.

For instance, here's a script to fetch all products from websites in a straightforward manner:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import json
import csv
filecsv = open('SouqDataapple.csv', 'w',encoding='utf8')
file = open('SouqDataapple.json','w',encoding='utf8')
# Set the URL you want to webscrape from
url = 'https://saudi.souq.com/sa-ar/apple/new/a-c/s/?section=2&page='
file.write('[\n')
data = {}
csv_columns = ['name','price','img']
for page in range(1000):
    print('---', page, '---')
    r = requests.get(url + str(page))
    print(url + str(page))
    soup = BeautifulSoup(r.content, "html.parser")
    ancher=soup.find_all('div',{'class' : 'column 
    column-block block-grid-large single-item'})
    writer = csv.DictWriter(filecsv, fieldnames=csv_columns)
    i=0
    writer.writeheader()
    for pt in  ancher:        
file.write("\n]")
filecsv.close()
file.close()

And the output would be:

[
{"name": 
"سماعات ابل اير بودز اللاسلكية، ابيض - MMEF2",
 "price": "665.00", "img": "https://cf4.s3.souqcdn.com/
 item/2016/10/06/11/64/54/16/item_M_11645416_16747749.jpg"},
{"name":
 "ابل ايفون 6 مع فيس تايم - 32 جيجا، الجيل الرابع ال تي اي، رمادي ",
 "price": "1,160.34", "img": "https://cf4.s3.souqcdn.com/
 item/2017/03/06/22/15/33/89/item_M_22153389_29502098.jpg"},
{"name":
 "ابل ايفون X مع فايس تايم - 64 جيجا, الجيل الرابع ال تي اي, رمادي ", 
"price": "3,199.00", "img": "https://cf5.s3.souqcdn.com/
item/2018/01/30/24/05/14/26/item_M_24051426_102956405.jpg"},
{"name": 
"ابل ايفون 8 مع فايس تايم - 64 جيجا, الجيل الرابع ال تي اي, ذهبي ",
 "price": "2,224.99", "img": "https://cf3.s3.souqcdn.com/
 item/2017/09/12/24/05/14/31/item_M_24051431_35103527.jpg"},
{"name":
 "ابل ايفون 8 Plus مع فايس تايم - 64 جيجا, الجيل الرابع ال تي اي, ذهبي ", 
"price": "2,548.99", "img": "https://cf5.s3.souqcdn.com/
item/2017/09/12/24/05/14/47/item_M_24051447_35103542.jpg"},
{"name":
 "ابل ايفون 6 بدون فيس تايم- 32 جيجا، الجيل الرابع ال تي اي، ذهبي ", 
"price": "1,148.00", "img": "https://cf3.s3.souqcdn.com/
item/2017/03/06/22/15/34/81/item_M_22153481_29502385.jpg"}]

But why should you care about these matters?

There are two reasons:

  1. Acquiring crucial information from different websites may help you in developing your tech product and even emulating their practices.
  2. Safeguarding your information and data through the solutions we offer for the website and API. We can handle current codes and further develop upon them.

Interested in achieving similar success? Book your first consultation for free now! We’re here to help you achieve success in your business.

Book Now!

Shawerr

© 2024 Shawerr for Technical Consultations. All rights reserved.