Compliance detection via web scraping

Competition and Markets Authority (CMA), UK – web scraping for detecting infringements and compliance

The UK CMA has built a data analytics platform (in Amazon Web Services or AWS) which uses an implementation of JupyterHub to sort and analyse large amounts of data relating to both competition and consumer protection issues.

The CMA has used the technique of web scraping[2] to monitor and detect a range of consumer law infringements on websites.  Machine learning with human oversight and checks is employed to analyse the data and assess where there is a problem. The DaTA unit (see Annex 2) has used analysed data collected via web scraping to look for patterns in online reviews that suggest fake or misleading practice. The unit also built their own tool to look at price data and detect where there is suspicion of retailers and manufacturers keeping prices at a fixed level.

Web scraping has also been used to check that companies are adhering to remedies or guidance stipulated by the CMA as a result of market investigations.  For example, following an investigation into payday lending, lenders were required to put a link to a price comparison website on their site. The CMA created code to scrape lenders’ websites to check they were compliant. In a similar vein, the check on adequately disclosing commercial relationships for social media endorsements, used scraping to automatically check compliance with guidance.