BACK
Category: Web scraping

Ethics of Web Scraping - Detailed Review

10 mins read Created Date: August 15, 2022   Updated Date: February 22, 2023
In this article, we gave you some tips to tell you how you can use web scraping legally and ethically, and we also talked about the latest legal situation by country with a short summary of web scraping cases

If you’re a web scraper, you’ve probably already learned how web scraping can benefit you. If your website is included in a web scraping project, you’ve also had the experience of getting angry at the bots that consume your traffic and your information that helps others thrive. Since web scraping seems to only benefit the scraper, many different questions have been raised. Is web scraping legal? Can we do web scraping for special purposes? Is web scraping an ethical act even if it is legal? Does web scrape damage your business’s reputation?

In this article, we gave you some tips to tell you how you can use web scraping legally and ethically, and we also talked about the latest legal situation by country with a short summary of web scraping cases. You should not forget that each piece of information we will talk about in this article is purely for informational purposes and should not be considered legal advice.

At Scrape.do, we are aware of the legal consequences of not running a legitimate web scraping project, and we want to help you run a legit web scraping project. We will also help you to run a web scraping project in accordance with your wishes and ethical rules in line with your request and your own ethical rules. Thanks to our Rotating IP system, you will also be able to scrape the websites you want. Contact us to get a maximum service at affordable prices!

History of Major Web Scraping Lawsuits

While running web scraping projects and scraping a website is superficially legal, scraping is not something companies want. If websites can show that scraping by a bot has damaged that website’s infrastructure and operations, it will be found illegal by the court. In this section, we have collected many cases in which many courts are in favor of scraped websites. Before examining these cases, businesses should also know that unless there is an overarching law, they cannot get results like the cases we describe below.

  • eBay and Bidder’s Edge Litigation: In the eBay and Bidder’s Edge lawsuit, one of the earliest known web scraping lawsuits reported to the public, the lawsuit was filed against Bidger’s Edge, an online price comparison site for consumers, in 2000. According to the court order, Bidger’s Edge would not be able to scrape eBay content again. The most important reason eBay won this court was that Bidger’s Edge had proven to have exhausted its systems and that such websites could potentially cause further damage to eBay’s system.
  • Facebook and Power Ventures Litigation: The main reason Facebook sued Power Ventures in 2009 was that the content uploaded by Facebook users was scraped by Power Ventures. As you can see from this case, web scraping was also evaluated in terms of intellectual property, and it was also an extremely important example in this field. Facebook had won the court, and Power Ventures was penalized financially.
  • Linkedin and hiQ Lab’s Litigation: The most recent web scraping lawsuit, currently pending and not yet settled, started in 2019 and the parties were Linkedin and hiQ Labs. HiQ Labs, which LinkedIn sued, was a data analytics company that scraped public profiles for a professional skills analysis. Although the initial decision by the district court found hiQ Labs guilty, LinkedIn decided to file a new lawsuit in June 2021 and appealed to the Supreme Court. Since this case is the newest web scraping case, it could be quite influential for the future of web scraping in the US.

What are the Latest Regulations for Web Scraping by Country?

While there is great uncertainty about whether web scraping is legal yet, in this section we will go over with you how the United States, European Union, the United Kingdom, and China regulate web scraping.

  • United States: There are no general laws against web scraping in the United States, as the deleted data is in the public domain and the scraping activity does not harm the website from which it was copied. Still, there has been a special law since 2016 against people who use bots and buy an excessive number of tickets at the same time to thwart black market sales.
  • European Union and the United Kingdom: The European Union aimed to bring all European Union countries together under the Digital Single Market, where they would be subject to the same regulation, and soon after passed the Digital Services Act. According to the third and fourth articles of this regulation, scraping and duplicating publicly available content does not involve any illegality. In this arrangement, web scraping is approached more from an intellectual property perspective. Also, a web scraping project that contains personal data and aims to obtain that data would be illegal due to GDPR. All regulations, except for personal information, are the same as in the USA.
  • China: According to the information written in English sources, there is no direct regulation preventing web scraping in China either. Like the intended use in other countries, the intended use in China is for business purposes. Also, scraping and using personal data is illegal in China as well as in European Union countries.

You Shouldn’t Disrupt the Web and Perform Denial of Service (DoS) Attacks

One of the first and most important things to consider when using a web scraper is that you often have to repeatedly query a website and potentially access a large number of pages. For each of the pages on this website, you must send a request to the server hosting the website, have the server process your request, and wait for it to send a response to the computer running your code. In response to each of these requests, the web server does nothing but respond to someone else trying to access the same site, so the resources on the server will inevitably be consumed slowly.

If you send the same type of request more than once in a very short time, you can prevent other normal users from accessing the website during this request submission. You may even cause the webserver to run out of resources and crash, which is an extremely effective way to hack a website. In fact, some hackers even perform an intentional Denial of Service (DoS) attack.

Although DoS attacks are extremely common in the internet environment, modern internet servers also have some measures to prevent the illegal use of resources in this way. It’s also worth mentioning that these modern servers are warier of large volumes of requests from a single computer or IP address. These web servers’ first lines of defense usually consist of rejecting requests from that IP address.

Even if the web scraper you are using is used for legitimate purposes and you do not want to crash a website, it is possible to engage in similar behavior. If you don’t do web scraping carefully and don’t respect the existence of the website, you could get your computer and IP address banned from accessing a website.

image

You Must Respect the Intellectual Property of Others

As we told you before, in some cases web scraping can be extremely illegal and the penalties you will receive are not to be taken lightly. However, you should keep in mind that this situation may differ from country to country. If the terms and conditions of the website you want to scrape are such as to prevent its content from being downloaded and copied, and you scraped that website, you could be in serious trouble. In addition, as we explained in the previous section, you can use web scraping practically if you do not disturb the regular use of the website and if you take reasonable care of the website. However, if you scrape and use data from the website without permission from the copyright holder, you are in breach of copyright law.

From a different perspective, web scraping is not much different from using a web browser to visit your website. You just use computer software to access publicly available data on the Internet, which is a scraper versus a browser. In addition, people who want to do web scraping need to be more careful and aware of the risk, as most laws see web crawling as very different from automated web scraping.

In general, if the data you want to use is public, that is, if the recorded content is not the kind of content that is behind a password-protected authentication system, you may be able to scrape the data without breaking the standard use of the website. The situation you could potentially be blamed for is whether the scraped data will be shared again. If you have completely downloaded content from one website and posted it on another website, unless this is expressly permitted, you are infringing copyright.

If you want to create a legal and ethical web scraping project, you can ask the site owner to share the data, avoid downloading copies of documents that are not openly public, check your local laws, refrain from illegally sharing your downloaded content, share it with others if the data you have obtained is public And you can stay away from DoS attacks on websites. If you follow the rules listed below, you will be running a legal and ethical web scraping project.

  • You can ask the website owner to share the data: If the data you need is such that you can get it from a website owned by a particular organization, you should try contacting that organization and asking them directly if they can provide you with what you are looking for. If you’re lucky, these people who already have the primary data they use on their website in a structured way will share the data with you and save you a lot of trouble.
  • You can avoid downloading copies of documents that are not openly publicly available: An academic journal publisher has very strict rules about what you can and cannot do with their database. It can easily get you and perhaps your friendly university librarian in trouble, as it’s completely forbidden to download PDFs of articles in bulk. If you need a local copy of some documents for your project, you should meet and negotiate with the publisher.
  • You can control your local laws: While some countries may do any kind of web scraping and there are no laws protecting them, other countries have laws that protect people’s personal information such as email addresses and phone numbers. Even getting this information from public websites is illegal.
  • You can stay away from illegally sharing your downloaded content: Although some information is protected by copyright, as intellectual property is a fair use issue, there is no problem when you scrape it for personal purposes. However, sharing this information with other people or institutions after obtaining it is extremely illegal and has serious sanctions.
  • If the data you have obtained is public, you can share it with others: If the data you have scraped is public data without any doubt and there are no legal barriers to sharing this data, you can share the data so that other people can use this data. If you want, you can also share the web scraper you wrote so that other people can benefit from it.
  • You can avoid DoS attacks on websites: Since not all websites and their servers are designed to withstand thousands of requests per second, it would be unreasonable to send too many requests to these websites in a very short time. If you’ve written a recursive scraper, you’ll need to test it on an extremely small dataset to make sure it does what it’s supposed to do. You should also make the necessary settings to create a delay between requests.

Scrape.do is an affordable web scraper service provider that builds the best custom scraper applications for any use case. With our innovative technology, we can scrape the data for your website, or even an entire network of websites. We have over 1000 satisfied customers and use cases from companies like Amazon, Facebook, eBay, etc. You can be assured that our service is totally ethical and legal. Contact us now!

See about web scraping protection and the best solutions!


Alexander James

Author: Alexander James

Hey, there! As a true data-geek working in the software department, getting real-time data for the companies I work for is really important: It created valuable insight. Using IP rotation, I recreate the competitive power for the brands and companies and get super results. I’m here to share my experiences!