Big Data and the Law: How to successfully navigate the data minefield

Posted by Simon Stokes on
Simon Stokes, a partner in our commercial team specialising in digital rights law, looks at the big data revolution and why businesses need to focus on their IP and data privacy strategies if they are to take full advantage of the big data revolution.  In October Simon is taking part in a LexisNexis webinar on Big Data and IP.

What is “Big Data” and why does it matter?

Some people have called personal information the “oil” of the 21st Century.  Just as oil fuelled the twentieth century industrial and transport revolutions, the new oil of information fuels the new economy.  And central to the future growth of the new economy is “Big Data.” 

Big Data is a buzzword and means may things to many people.  One way to think of it is a description of the accumulation and use of vast and complex information databases using multiple servers that can't be carried out using just a single server or PC.  For example all of the following involve big data:

  • a major supermarket using its vast customer database to profile and target its customers with special offers
  • social networks data mining user generated content (UGC) (such as photos, messages etc), health/fitness data from wearable devices and user traffic data to target advertising to users
  • internet search engines analysing browsing habits
  • a global bank creating a financial model taking data from stock market indices, GDP/economic data, trading data, consumer data (RPI/inflation etc)
  • RFID data that allows organisations to track, trace and manage their assets, stock and inventory
  • Patient data is combined into a national database which can be used for research as well as healthcare purposes

Big data is often characterised by the three V’s:

  • Volume – there’s just masses of data.  As of 2009 the global data volume was estimated to be 800 exabytes (that’s 800 billion gigabytes) and of course now the figure will be much greater.  The New York Stock Exchange reportedly captures 1 terabyte (that’s 1000 gigabytes) of trade information each trading session.
  • Variety – data is combined from a variety of sources – weather data, online traffic data, UGC, location/geo data, RFID data, point of sale data and so on
  • Velocity – data is being generated at speed, collected at speed and processed and analysed at speed.  It has been estimated that 90% of data in the world was created in the last two years.

Some people also add a fourth “V” – Veracity – there may be lots of data but how accurate is it, what’s it quality, can it be trusted to be used to make business decisions?

Big data allows organisations using it to create information about data that were not necessarily apparent or intended in the source information.  For example take the communications data from millions of phone calls – analyses this and it's possible to discover a huge combination of factors relating to the nature of communications, and user relationships and behaviours.

Big data needs large computing resources – thousands of servers for example, and fast connectivity and processing.  But its value is immense - it allows data to be mined for trends, for predictive analysis, it allows new products and services to be created, such as behavioural advertising.  Businesses which grasp the big data revolution will be more flexible, responsive and innovative.  Indeed businesses which don’t harness big data will find themselves slipping behind.

Big data poses some big legal challenges.  Address these properly and businesses will find big data becomes an important, if not their most important asset.  Neglect to address them and businesses will find themselves risking regulatory sanctions (fines, enforcement action), class action law suits, competitors stealing a march on them and a reduction in the value of their assets and a fall in their share price. 

There are two big legal issues around big data.  First protecting it – that’s a matter of intellectual property law.  Second fairly and lawfully collecting, processing and using it – that’s a matter of data protection/data privacy law.

Big Data as Intellectual Property

In English law judges have long held that information as such isn’t “property”.  But that doesn’t mean data can’t be protected as intellectual property (IP).  This can be done in six ways:

  • By keeping the data (and the proprietary algorithms and code used to mine the data) and any database formats confidential
  • Through copyright protecting elements of the data and databases themselves if they satisfy the test for copyright (they need to be original works of authorship -  i.e. they constitute the author’s own intellectual creation - this may be a problem for certain classes of data and databases which are simple in structure)
  • Whether or not copyright protection is available the investment spent in obtaining, verifying and/or presenting the data can be protected by the “sui generis” EU database right under the Database Directive – although the database maker needs to be in the EU/EEA to benefit from this right
  • Through contract – ensuring that you only grant access on contractual terms which protect and licence your IP.  A good example of this is the recent European Court of Justice (ECJ) case Ryanair v PR Aviation – here data which PR Aviation scraped from Ryanair’s website was held by the Dutch courts not to be protected by copyright or database right but the ECJ held that use of the data could still be restricted by contract, through Ryanair’s website terms of use – although in the UK at least it is unclear to what extent website terms of use will be considered a contract by the courts – data owners will be in a much stronger position if clickwrap licence terms are used from the outset.
  • Through patents where you have devised a technical innovation which has a technical effect (for example a means of compressing or storing data which makes a server run more efficiently)
  • Through trade marks and domain names where big data is used as part of a branded service

To benefit from this protection you need to have policies and procedures in place to protect your IP.  This means ensuring you address the data lifecycle by answering three broad questions:

  • How is the data originated and sourced? How is it created and by whom?  Are they your employees or third parties (including contractors or consultants).  Are you licensing in data – on what terms?  Can you reuse it?  Who owns the products of any reuse?
  • Who is building your database(s)?  What rights do you have? Is any database software used and how is it licensed?  Have you kept records of your investment?
  • How do you make your data available? Do your standard terms of business protect your IP?  Do you have binding licence terms?  If you make your data and databases available do you include copyright and database rights notices?  Remember that users have certain fair dealing rights under copyright law (e.g. in respect of data mining).  Remember too that if you are a trader supplying data to consumers you are going to need to comply with the Consumer Rights Act 2015.

Big Data and Data Privacy – designing privacy into Big Data

The challenges of big data to data privacy

EU (and UK) data privacy laws bite on the use of big data to the extent the data in question is “personal data”.  This means data which contains information so that:

  • A living individual (the data subject) is identified in this information (e.g. they are named or otherwise identified (e.g. date of birth, sex, address)), or
  • If a living individual (the data subject), while not identified, is described in this information in a way which makes it possible to find out who the data subject is by conducting further research – i.e. they are identifiable.  The test here is whether it is likely that reasonable means for identification will be available and administered by foreseeable users of the information and this includes third party recipients (Recital 26 of the EU Data Protection Directive) – this identification can be direct or indirect.  An example would be a car number plate or a telephone number – this data by itself doesn’t expressly identify the car owner or phone subscriber but information linking this data to individuals is potentially readily available.


The second limb – that the data subject is identifiable – poses a big issue for big data users.  It is often assumed that if data is “anonymised” then it ceases to be personal data and so data privacy laws can be ignored.  That is of course in one sense correct but it is surprisingly difficult to anonymise personal data as all identifying elements have to be eliminated and no element may be left in the information which by exercising reasonable effort would serve to re-identify the person concerned.  Also the very existence of big data – combining data from databases of a wide variety of information – can make it more likely that taken together an individual can be identified from the data which the big data user (data controller) has.

Also anonymisation may destroy the value of the data or indeed it can’t be anonymised to be used – the data may need to be linked back to an individual even though those processing the data don’t need to know the actual identity of the individual.  So whilst some big data may truly be anonymous e.g. statistical data derived from interrogating the data – e.g. the % of a population with heart disease -  other data may not be and indeed may need to retain identifiers.

Pseudonymisation and privacy by design

One solution here is to create “pseudonymised data”.  Personal data typically contains identifiers such as a name, date of birth, sex and address – these identifiers can be replaced by a pseudonym and “pseudonymisation” is achieved by the encryption of these identifiers in personal data.  For example, assuming all other data privacy requirements were satisfied, a hospital might want to provide patient data to researchers – this could be pseudonymised so it is impossible or very difficult for the researcher or anyone else for that matter to identify the patient from the data.  Pseudonymisation is an example of a privacy enhancing technology and is an important element in implementing privacy by design – which is crucial to making big data work under privacy laws – privacy by design means from the outset data protection is built into big data databases and their data.

Indeed privacy by design is an express requirement of the proposed EU General Data Protection Regulation (Article 23) which is expected to be adopted in the near future.

Steps to be data privacy compliant

Those creating collecting and using big data need to pay attention the following questions:

  • Is the data either not personal data at all (e.g. weather data) or is it anonymous?  Do not assume the data is anonymous even if on its face it appears to be – you need to apply the identifiability test under data privacy law. 
  • If the data you have is personal data are you data privacy compliant? – was it fairly collected, are you using it for lawful purposes, is it kept secure and up to date and so on.  If you are supplying third parties with the data or exporting it offshore are you compliant with the laws surrounding the transfer and export of personal data?  Transparency (making clear to users how their personal data will be used), fair processing and using the data for the purposes for which it was collected ("the purpose limitation principle"), and consent underpin data privacy law – for example Google found themselves investigated by national data protection authorities over concerns about how they collect and process personal data and in January 2015 Google undertook to make improvements to its 2012 privacy policy to meet the concerns of the UK’s Information Commissioner.
  • Are you building privacy by design into your databases and data collection and use strategies?

Answering these deceptively simple questions will often not be easy.  But it will be much easier if the business/organisation concerned has implemented information governance and management policies and procedures and is abreast of current EU developments such as the proposed General Data Protection Regulation.


Big data raises big legal issues.  Businesses which take the protection of their IP seriously and how they manage their information, including data privacy compliance, will be ahead of the pack in a world where “Chief Information Architects” and “data scientists” within companies will be expected to connect data analytics (big data) with a company’s operations to create “operational intelligence” and where some of the world’s largest and/or fastest growing companies have big data at the very core of their business.


About the Author

Leading the firm's technology practice in London, Simon specialises in information technology law, including outsourcing, cloud services, protecting software IP and licensing of market leading data analytics software.

Simon Stokes
Email Simon
020 7814 5482

View Profile