Sign Up

Have an account? Sign In Now

Sign In

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here

You must login to add post.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Passionable Logo Passionable Logo
Sign InSign Up

Passionable

Passionable Navigation

  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • New Questions
  • Trending Questions
  • Must read Questions
  • Hot Questions
Home/ Questions/Q 6982
In Process
Alek Richter
  • 0
Alek RichterEnlightened
Asked: December 23, 20212021-12-23T08:15:49+00:00 2021-12-23T08:15:49+00:00

Problem HTTP error 403 in Python 3 Web Scraping

  • 0

I was trying to scrape a website for practice, but I kept on getting the HTTP Error 403 (does it think I’m a bot)?

Here is my code:

#import requests
import urllib.request
from bs4 import BeautifulSoup
#from urllib import urlopen
import re

webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1').read
findrows = re.compile('<tr class="- banding(?:On|Off)>(.*?)</tr>')
findlink = re.compile('<a href =">(.*)</a>')

row_array = re.findall(findrows, webpage)
links = re.finall(findlink, webpate)

print(len(row_array))

iterator = []

The error I get is:

 File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 479, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 517, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
  • 1 1 Answer
  • 2 Views
  • 0 Followers
  • 0
    • Report
  • Share
    Share
    • Share on Facebook
    • Share on Twitter
    • Share on LinkedIn
    • Share on WhatsApp

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Alek Richter Enlightened
    2021-12-23T08:16:25+00:00Added an answer on December 23, 2021 at 8:16 am

    This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it’s easily detected). Try setting a known browser user agent with:

    from urllib.request import Request, urlopen
    
    req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'})
    webpage = urlopen(req).read()
    

    This works for me.

    By the way, in your code you are missing the () after .read in the urlopen line, but I think that it’s a typo.

    TIP: since this is exercise, choose a different, non restrictive site. Maybe they are blocking urllib for some reason…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report
Leave an answer

Leave an answer
Cancel reply

Browse

Sidebar

Ask A Question

Stats

  • Questions 4k
  • Answers 4k
  • Best Answers 0
  • Users 3
  • Popular
  • Answers
  • Alek Richter

    How do I add sockets to an item?

    • 2 Answers
  • Alek Richter

    Dialog throwing "Unable to add window — token null is ...

    • 2 Answers
  • Alek Richter

    Is the Glamour College Bard's Mantle of Inspiration Overpowered?

    • 2 Answers
  • Alek Richter
    Alek Richter added an answer Pandas DataFrame columns are Pandas Series when you pull them… January 13, 2022 at 2:21 pm
  • Alek Richter
    Alek Richter added an answer The handshake failure could have occurred due to various reasons:… January 13, 2022 at 2:19 pm
  • Alek Richter
    Alek Richter added an answer Mac OS X doesn't have apt-get. There is a package… January 13, 2022 at 2:18 pm

Top Members

Alek Richter

Alek Richter

  • 4k Questions
  • 1k Points
Enlightened
coinbaseprosupportnumber[Xsdwerfgtyujhnbgfderewqasdxz]

coinbaseprosupportnumber[Xsdwerfgtyujhnbgfderewqasdxz]

  • 2 Questions
  • 2 Points

Trending Tags

questin question

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • New Questions
  • Trending Questions
  • Must read Questions
  • Hot Questions

© 2021 Passionable. All Rights Reserved

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.