"Is scraping legal" is the wrong question—like "is a knife legal". What matters is what you collect, where from, and how. The same data may be fully permissible in one context and risky in another. Below are four dimensions that need to be separated.
1. Type of data: is it personal data
#If you collect personal data (even if publicly visible—names, emails, profiles), RODO applies: you need a legal basis, purpose, minimization, and information obligation. Purely technical or product data is easier in this regard. This is the first and most important distinction.
2. Source: terms of service and database rights
#A service’s terms of service (ToS) may prohibit automated collection—violating them is a contractual issue, sometimes with real consequences. Separately, sui generis database rights apply: extracting a substantial part of a protected database, where the producer incurred investment, may constitute infringement even for non-personal data.
3. Method: don’t disrupt the service
#Technique also creates legal risk. Aggressive querying that overloads the server may be treated as disrupting system operations. Best practices: respect robots.txt, limit request rates, identify yourself in headers, and collect only what you truly need.
4. Prefer official channels
#Before resorting to scraping, check APIs and open data. Many institutions provide data officially (e.g., public registries, open data portals)—this source is more legally and technically stable. We apply this principle in PropTech, where we use dane.gov.pl instead of bypassing third-party services.
Collecting and structuring data within these boundaries is part of our data services—we design collection to be legally compliant from the start, not after the fact.
FAQ
#Is scraping public data legal?
#Publicly available data can generally be collected, but "public" doesn’t mean "unrestricted". Personal data falls under RODO, databases may be protected, and a service’s ToS may prohibit automation. Legality depends on context.
Can I collect emails and contact details from websites?
#These are personal data, so RODO applies—you need a legal basis, purpose, and information obligation. The fact that an address is publicly visible doesn’t automatically grant the right to collect and use it.
How to collect data safely?
#Start with official APIs and open data, respect robots.txt and rate limits, avoid collecting personal data without a legal basis, and don’t overload services. Design compliance into the collection process, not after a breach.