Some websites can block access to prevent web scraping, that can be easily detected if your Python script is sending multiple requests in a short period of time. To not get banned you can try to add random delays between queries. For this you can use the Python’s sleep() function that suspends (waits) execution of […]
python
Python: Download File from URL & Save
A Python can be used to download a text or a binary data from a URL by reading the response of a urllib.request.urlopen. The downloaded data can be stored as a variable and/or saved to a local drive as a file. Below you will find the examples of the Python code snippets for downloading the […]
Python Requests ‘User-Agent’ – Web Scraping
A ‘User-Agent’ HTTP request header is a string that a web browser is sending to a web server along with each request to identify itself. The ‘User-Agent’ string contains information about which browser is being used, what version and on which operating system. Some websites block access from non-web browser ‘User-Agents’ to prevent web scraping, […]
Python: Module Path – List Modules & Get Locations
Let’s say you have a Python module somehow installed on a computer, so you can “import” it, and you want to find a path to this module to check its source files. In this note i am showing how to list all the locally installed Python modules and how to find the paths to these […]
Indexed by Google: Pages Checker on Python
To check if a webpage has been indexed by Google, you can manually search for site:<webpage> in a Google Search, for example: site:https://www.shellhacks.com/indexed-by-google-pages-checker-on-python. If the page is indexed by Google, you will see the URL in the result of the Google Search, otherwise you will see the message as follows: Your search – site:https://www.shellhacks.com/indexed-by-google-pages-checker-on-python – […]
PIP: Install From Private PyPi Repository
By default pip installs packages from a public PyPi repository but can also be configured to install them from the private repositories, like Nexus or Artifactory. In this note i will show how to configure pip to install packages from the private repositories. I will also show how to define username and password in pip […]
Pip: Install Requirements – Exclude Specific Packages
The requirements.txt file that contains the Python application dependencies is usually generated by developers using the pip freeze > requirements.txt command. If you install dependencies with the pip install -r requirements.txt command as a part of some automated pipeline or a Dockerfile and you have an issue with some of the dependencies, you may wonder […]
Pip: Show Python Package Dependencies
The dependencies of the installed Python packages can be listed using the built-in pip show command. Alternatively the dependencies can be shown as a tree structure using the pipdeptree command. In this note i will show several examples of how to list dependencies of the installed Python packages.
Pip Install – SSL Error: Certificate_Verify_Failed
If you are trying to install some Python package using the pip install command and pip fails to verify the SSL certificate, you may receive the error as follows: Could not fetch URL https://pypi.org/…/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host=’pypi.org’, port=443): Max retries exceeded with url: /…/ (Caused by SSLError(SSLCertVerificationError(1, ‘[ SSL: […]
Pip – Install Specific Version of a Package
By default, the pip install command installs the latest version of a package. However, it is often necessary to install an old version of a package to much some specific requirements. In this post i am showing how to install the specific version of a package using the pip install command.