Our data blog
Posts about live events information, its quirks and other data related to our work.
- City Mobility - Decisions based on events data
- The ever painful truth about CSV
- The growth of the Edinburgh Festival Fringe
- The impact of events on consumer demand
- The uses of live events data
- The State of Arts criticism across the Edinburgh Festivals – some data
- Starting out, and arriving at an event taxonomy
City Mobility - Decisions based on events data
8 October 2019
A guest posting by Joshua Ryan-Saha, Data-Driven Innovation Lead – Tourism & Festivals, The University of Edinburgh
If you've been in Edinburgh during August, you've witnessed a city transformed. The population swells and the party begins. Taken together, the seven main August festivals represent an event close to the size of the Olympic Games. It is a gargantuan effort by everyone involved with running the city to make it work. The Olympics are every four years. This takes place every year.
However, as most of us who live in Edinburgh know, when dealing with this many people, things just won't work as well as we'd like. Our commutes maybe that little bit longer. We might have to stand on the train from Glasgow. We might take a stroll in the bus lane to sneak past the crowds. Edinburgh's festivals are the envy of the world but for many of Edinburgh's citizens, life can be just a little more difficult at that time of year.
What can make it work better?
The Edinburgh Futures Institute, The List, EPCC and Transport for Edinburgh have established a pilot to see if we can bring together event and transport datasets to better understand festival-related traffic and congestion during August. With funding from Transport Scotland, Scottish Enterprise and the University of Edinburgh's Data-Driven Innovation Programme, the pilot will look at how we could design a data 'product' that transport providers, city authorities and festivals could collectively explore to help them make decisions when planning next years festivals.
- Should they increase the number of bus services, and if so when could they do it without contributing to congestion?
- Can we use events data to anticipate a need for more Just Eat bicycles at Bristo Square?
- Will more trams on a Friday night help Tattoo patrons get back home?
- Should this road be closed, or that one and when?
Data-driven optimisation is difficult due to the complexity and variety of data around mobility. Trajectories of individuals in motion, a multimodal transportation network, and the various types, sizes and unknown popularity of event venues all contribute to mobility problems. Data doesn't hold all of the answers, but we think that shared data, shared insights and shared decisions across the varied organisations involved with making the festivals work might make some marginal gains.
This is where The List's data set is a vital element. With the UK's most comprehensive set of live events data, including every show at every Edinburgh festival, they know where and crucially when, people are likely to be coming in and going out of shows. Combined with analysis around future predictions of audience size based on historical analysis, the data begins to create a picture of demand across the city at any given time in any given location.
This data ingredient can help the city’s transport providers make more intelligent decisions around supply of additional transport, identify potential bottlenecks and also indicate where previous oversupply of capacity has taken place. In short it will help transport companies provide efficiencies across their networks and improve services for the benefit of all locals and tourists. Smart cities can be an elusive goal, but projects of this nature, with data sharing, and accretion of marginal improvements, can contribute to helping Edinburgh along that road.
The ever painful truth about CSV
20 September 2019
In the last 5 years, we have gathered information related to over 7,000,000 performances of around 700,000 live events held at one of over 80,000 venues, so while we claim certain domain experience, we still face challenging problems related to certain types of data.
Over the years we have significantly increased the processing of structured feeds of data. Typically, these come from a range of box office systems, theatre chains, festival organisers and music promoters. As recently as 2010 manual data entry provided 90% of our live events data, but now over 90% of our event records come from structured feeds. Each week, we load files related to thousands of venues, acquiring all the new information and merging the overlapping data coming from more than one source.
But nearly 10% of our events, or around 24,000 in 2018, arrive in other ways. Our website provides an online event submission form which we have steadily developed over the years, adding formatting tips, venue look ups and other guidance, and we do end up with data in our structure. Unlike data from feeds, where we can be sure of the authority of the source, this data requires checks before we can pass it for publication and distribution. The online form is a robust way of gathering the most fragmented data.
Within this final 10%, alongside the website submission, we also work with organisers, such as festival marketing teams, for whom their data management practice is largely based on comma-separated value (CSV) files. These remain difficult but important listings for us to gather. It represents several thousand events in the course of a year.
To date, we have provided a service for event submitters to send us CSV files above a minimum quantity. We tried to address file format variation and consistency issues with a CSV template which we make available. However, event organisers often have a variety of intended uses for the sheet, or may not see we provide a template until they contact us with a file in their own format.
Arriving at the view that CSV files are not a fruitful solution for the interchange of data is hardly a revelation, and this is not restricted to events data. There have been attempts by quite a few companies over the last decade to create smart CSV data mapping and management services, a notable example is Google’s Needlebase, which became Refine, and then OpenRefine. None has succeeded in solving the central problem, which is that CSV files are unstructured, and once you have cleaned the data, it can then be mapped, but the hard work is already done.
In our online form we implement online ‘smarts’ to help input.
In the form we:
- ensure people only enter a single numeric data point under minimum price and provide a free text box for the exceptions
- look up the venue, and ensure a close duplicate is not created
- easily allow for repeating performances without manual entry of each one
- provide an integrated capability to add images
We recently analysed the work involved in a sample of CSV files we have received recently, and considered the work of the provider, our manual data editing and the CSV file load process finalization for each case. CSV files are often consuming 2-4 hours of effort across the parties, often for just 40 events.
Our conclusion is that there is nothing better than structured data and we want to encourage event submitters to make this their format of choice. We will supply event promoters with their own data, submitted via our form, back to them in a structured format for them to use with other third parties. This will be a no charge additional element of our listings service for event organisers.
We have also concluded that we should restrict our use of CSV to files with more than 60 events, directing submitters with fewer events to the form with the functionality we have described.
The growth of the Edinburgh Festival Fringe
3 September 2019
Now, Edinburgh in August is far more than the Edinburgh Festival Fringe, as the name makes clear. (Each year some unfortunate PR or BBC Front Row presenter, who should know better, gets a flea in the ear for calling it the Edinburgh Fringe Festival - beware).
But the Fringe is a big festival in its own right, and to their credit the Fringe Society publish an annual review. The numbers that follow come all but exclusively from them. In advance it would be sensible to provide the caveat that some definitions may have changed over the years making individual data points unreliable for comparison but the trends are clear.
The available data starts and ends in different years so this article uses the following notation (2019:2010) to indicate the latest available year and the base year.
Confirming the growth
- 67% increase in participants (2019:2010)
- 71% increase in arts industry accreditations (2019:2011)
- 65% increase in ticket sales (2019:2010)
- 39% increase in Fringe Society revenue (2018:2012)
- 43% increase in tickets purchased by Edinburgh residents to 856,000 (2019:2018)
Media accreditations have an 8 year average of 1018 (2019:2011) and in 2019 there were 1004. This quite strongly refutes that there are less media in Edinburgh, though the changes in title representation, and whether these critics are paid remain valid questions.
Participants per show has been very steady for 9 years (2018:2010), with a noticeable uptick in 2018, so it will be interesting to get confirmation of 2019 data. Shows do frequently split costs between participants so the steadily increasing expense of bringing a show can be mitigated by increasing your troupe. It is also certainly possible shows are registering more participants for the benefit of getting access to tickets for other Fringe shows as an inducement to share the costs.
Fringe Central events have been very steady at just over 100 a year over the 9 years, though the professional development expertise on offer for participants (based just on feedback) has got much richer.
Data on the number of shows that are premieres is on offer to only 2016 with about half of shows marked as such. It would be speculation but it is possible that the term ‘premiere’, in the face of enthusiastic public relations efforts, suffered a definitional issue, and so the number was no longer worth counting.
It is a sign of the times that for a few years in the period 2012-2014 that web site traffic was deemed important, and then briefly app installs, and even the hot air of social reach. No effort was expended on analysis of this data.
Our 2018 article on reviews from 2010 to 2017, will be extended to cover 2018 and 2019 as well. The top headline will be that the decline in total numbers of reviews has been arrested but analysis is worthwhile and not all is well.
Other than the numbers
The decades long pattern of click bait articles bemoaning the state of the Fringe, continued into 2019. The concentration of events in an around George Square also continued, but it has to be said most consider it is great for audiences, and also notable that if you walk many streets just 1500m away, you are hard pressed to see any sign there is a festival on at all. The lives of many in parts of the city are largely untouched, until they encounter the transport struggles of the central area of Edinburgh.
Neither the Edinburgh International Festival, nor the Edinburgh International Book Festival were able to give a number for tickets purchased by Edinburgh resident’s, but it seems likely that across all the festivals over 1,000,000 tickets were bought by its citizens, suggesting strong local support though purchasers are overly concentrated in some areas. Efforts to widen the geographic spread of attendance across the city are highly valuable, though no more and no less than is promotion of arts participation across the UK, on a year-round basis.
The Spirit of the Fringe
The critics of Edinburgh in August quite frequently suggest the absence of the spirit of the festival fringe, but 2019 saw the blossoming of an initiative driven entirely by individual endeavor applied to solving an identified problem.
With Jessica Brough in the leading role, The Fringe of Colour created a list of qualifying shows, gave them visibility, got some tickets donated* with participants still getting their revenue and ensured the tickets reached the hands of attendees so that these shows had stronger audiences. And, of course, then organized a party to celebrate the initiative.
Could the spirit be any more vigorous than that? Hardly, though I dare say it might welcome a sponsor for 2020.
There are more problems to solve: if you identify one, please just get to it.
*Credit to Assembly, Gilded Balloon, Scottish Storytelling Centre, Summerhall, Traverse, Underbelly and Pleasance.
The impact of events on consumer demand
11 April 2019
As we outlined in our earlier article, our live events data has always helped address a range of needs. This post looks at some of the key problems it addresses.
The leading three are:
- Yield/revenue management – helping hotel groups and travel companies, especially airlines and rail operators, predict future demand more accurately through improved visibility of demand shocks.
- Competitiveness – providing valuable information to assist destination marketing and media companies attract visitors and users. Positioning taxis in more strategic areas than their competitors.
- Services – enabling smart cities, police forces and concierge services to deliver improved efficiencies and cutting-edge capabilities, as well as analytic capabilities for sectors such as insurance.
Yield management is an especially active sector. The airline industry in particular has invested heavily in yield management systems but to date has not had access to good quality future events data, relying on past booking records and patterns.
We have a case study showing how a 120-room hotel outside Cardiff, Wales sold out their hotel inventory in just 20 minutes without effectively pricing their rooms as they lacked information on an Ed Sheeran concert at the Millennium stadium. Their yield management system, based on the price of competitor rooms, historic bookings and remaining rooms, which updated every four hours, was unable to respond effectively. This led to the hotel losing out on £9,500 of revenue that day. Our data would have revealed that there were almost certainly to be two concert roll dates, which there were. And the total loss for the week for one hotel, was £28,500 in a single week.
Whilst this related to a large high-profile, high visibility event, medium and small events can often combine to provide similar consumer demand shocks which our data can predict.
Our other posts touch on the manner in which we collect core live events data, which results in the most comprehensive data set on offer.
But it is not just live events data that is important in the work of predicting consumer demand - a range of other types of event information are needed including weather, public and local holidays, news events and travel disruption information.
Furthermore, the provision of such a rich data set alone still leaves the airline, or any yield management user with the task of integrating the information with their own historical and live booking information.
The List’s traditional what’s on data and services are showing great breadth of application. Consumer demand prediction is just one.
The uses of live events data
9 November 2018
The List has always, to some extent, been a data business. Back in 1985 we manually copied live event listings from printed venue programmes in order to publish them. We wanted to help readers of our fortnightly magazine, as we put it back in those days, Get A Life. Even then we were assembling data. Ten years ago, we collected around 3,000 event listings for Edinburgh and Glasgow.
These days of course the world has changed. These days we use AI and semantic in processes for assembling the UK’s most comprehensive set of live events data – around 40,000 event listings and an average of 500,000 performances.
And as that world has changed, so has the number of uses to which we can put our data.
Our very first information services clients were media companies. We used to supply local newspapers with print ready copy of curated lists of events – ten of the best clubs or the best gigs to go to that week. Over time many of those print contracts became digital ones.
We also work with many destination marketing organisations, such as PeopleMakeGlasgow.com, supplying them with around 20,000 performances of live events at any one time which shows off the deep and rich cultural offering that the city provides. The data provides cities, regions and countries with competitive advantage in a fiercely contested global tourism industry (where experiential tourism is on the rise).
But we now conceive of a much wider range of services.
Looking at accommodation as a single vertical, live events data can provide business intelligence around future demand – essential for accurate yield management. It can also provide services around what to see and do close to the hotel for an authentic visitor experience, delivered by email pre-visit or even Alexa voice-based concierges in real time. As hotels try to wrest their audience back from online travel agents there is a need to improve their own CRM and an understanding of what the guest has experienced contributes towards improved efficiency of future messages. After all – if you liked the craft beer festival you experienced in Manchester, wouldn’t you also like to hear about the six best craft beer festivals coming up across the UK (that just happen to be close to a hotel with a personal discount in place)?
As our cities become smarter, so their need for data increases.
Our live events data will be used in two forthcoming projects in our home city of Edinburgh. Firstly, in refuse collection where our past data can be tallied with bin capacity monitors identifying which events cause the most rubbish. The results can be used to predict where refuse collection services are required in advance leading to more accurate planning rotas and more efficient collection.
Secondly, we plan to work with Transport for Edinburgh looking at the effect that the Edinburgh Festival, the world’s largest arts festival, with collectively over 4,000 events and more than 100,000 performances in a month, has on the cities transport network. Again, the results can be used to predict transport usage more accurately, avoid bottlenecks and make sure that both tourists and locals alike have a more pleasant experience in August.
We are also planning a hackathon with the emergency services to help better determine where forces should be placed on a Friday and Saturday night based around the movement of people and how many doctors are required by the NHS of an evening.
These are just some of the ideas we are working on. We are seeing the development community come up with plenty more that we never even conceived of, which is why our API is a free to play sandbox up to a certain volume of requests.
The List in fact is endless. More strategic positioning of taxis based on events that are just about to finish. Correct staffing and food levels for restaurants based on increased localised demand. SatNav advice that takes future traffic into account based on specific events.
In addition to all the business intelligence that live events data can provide, we can also help to inspire people to go and visit an event – to travel by rail or air and go see something amazing. Ticketmaster still state that the biggest reason still that someone doesn’t go and see a show is because they didn’t know about it(Mobile world congress, 2016).
The syndication of our data far and wide by a multitude of businesses and organisations will help to solve this problem. Which all ties rather nicely back to 1985 where the business set out with an ethos to provide the best arts and entertainmnt information to the widest number of people possible. At one time that was just Edinburgh and Glasgow with a magazine.
In the future, we hope to do this for the world.
The State of Arts criticism across the Edinburgh Festivals – some data
19 September 2018
Edinbrru phonetically, Edinborough in a North American transatlantic drawl, or Auld Reekie or Old Smoky, in times gone by, hosts the world’s largest set of festivals taking place all round the year. They come to a peak in late July and August with a flush of 7, 8 or 9 festivals depending on how you count.
Across the core August festivals, namely the Edinburgh International Festival, Edinburgh International Book Festival, Edinburgh Art Festival and Edinburgh Festival Fringe and then a set of overlapping free festivals, there are nearly 4,000 shows with 50,000 performances at one of 400 venues.
And all in a city of just half a million. With the same number of visitors on top, the city is bursting with people.
With so many shows the role of the critic becomes as important in this context as for any Broadway or West End show.
And with so many new shows, and so many new performers, they often can't succeed in getting the audience of which they hope, so to get reviewed at all is an ambition, to get reviewed by well-known titles more so, and a good review is itself the dream. Not only may a good review fill the show in Edinburgh, but can also for subsequent runs over months and years to come.
So, it does matter to the audience and the performers if there is a decline in the extent of reviewing at the Edinburgh Festivals.
The List have aggregated reviews across a range of titles since 2009, and publish this data.
For a presentation in August 2018 organised by the Edinburgh Festival Fringe, entitled The State of Arts Criticism, we took a consistent set of titles that undertake Edinburgh Festival reviewing and looked at the decline from 2012 to 2017. Our data shows a significant decline in the set, from 5134 reviews in 2012 to 3169 in 2017.
The dotted lines represent titles which, following enquiry with them, we understand do not pay their reviewers.
Alongside this decline, we also examined the rigour with which journalists approached their task.
With so many shows it would be healthy and wise for reviewers and their editors to select shows that are expected to be the more interesting. But the outcome should be balance none the less, of poor shows and good shows. And the star rating system can illustrate the balance that is sought by displaying star ratings that conform to the normal curve.
So, it is interesting to see that titles that employ paid critics, in aggregate, deliver a result that conforms more closely to the statisticians ideal. A perfect normal curve would have a peak at 2.5 stars and a broad curve. So the further to the right is the peak, the greater the star rating inflation.(The Scotsman just looks an outlier as reviews under 3 stars are usually only printed.)
Comparing critics and users ratings shows that the later mostly undertake the task of reviewing as a promotional activity.
Our final chart shows that The List published more reviews online in 2018 than any other title.
We have more data covering other titles and by writer which can show some interesting trends. And we hope may be the subject of a later post.
Starting out, and arriving at an event taxonomy
31 August 2018
The List started collecting live event listings in 1985. We possess the UK’s leading set of live events data with up to 600,000 future performances of 45,000 events at one of 85,000 venues across the length and breadth of the UK.
The events are spread across a range of event types and a while ago we held an engineering away day during which we developed a taxonomy of event types, which I describe in this post.
Before embarking on the post, and by way of background I wanted to quickly outline our history as a listings collector and an overview of our means for sourcing this highly fragmented data from a very large number of venues. Over the decades we have worked our way steadily up a hierarchy of methods for sourcing listings –
- 1980’s - postal mail, card index files
- 1990’s - fax, email and manual input to db
- 2000’s - web sites, CSV and first loading
- 2010’s - web forms, file transfer
Now our preferred and dominant input is structured files from box offices and other primary sources.
We receive and process 200 or more regular feeds, many daily, be that from cinema chains like Cineworld, theatre groups like Ambassadors Theatre Group ATG or an individual art centre, such as Stockton Arts Centre ARC.
This activity is supplemented by data we receive on our web form, with around 40,000 submissions in a year, which allows us to accrete a big range of smaller events from organisations that don’t have the ability to supply the structured data. All such submissions are subject to human oversight.
We target over 99.99% accuracy in core data of date, place and time.
So, looking across the range of events information we receive, we see events falling into 9 categories which we outline in the table below:
|Event type||Category||Examples||Deterministic features||Possible overlap into|
|Consumer||1||Sale at Harrods, craft fair, car boot sale, celebrity appearance||Commercial, transactional||Entertainment|
|Business and professional||2||AGM for BP, sessions of parliament, public lecture by Dir of Institute of Chartered Accountants||Open to public, possibly some qualification to attend, free or moderate entry price||Community|
|Community||3||Public Info Notices of planning meetings, blood donor locations, marches, church fetes||Of interest to narrow geographic area, or well defined community of interest||Business|
|Conference, expo||4||Crufts, Wedding World, Edinburgh International TV Festival||Trade show with stands, speaker sessions, sometimes significant entry cost||Consumer, business|
|Entertainment||5||The original and primary List categories including theatre, music, dance, visual art, sport, etc.||Consumer|
|Education, workshops, courses and classes||6||Rock Choirs, Yoga classes||Led by a teacher||Activities|
|Activities, physical recreation and participatory sport||7||Wildlife Trust walks, London Marathon||Participatory||Education|
|Attractions||8||Legoland, Albert Memorial, Green Park, British Library||A range of places of interest that may be open 365×24, have daily or seasonal opening dates and times. May have timed entry tickets.|
|Restricted (generally, not for publication)||9||Private views, an employee only Christmas club night, press previews and photoshoots||Invitation private views, employees only, personal events||Business and professional|
|Other||0||Public and local holidays, weather and news events including marches, strikes and terror incidents||Anything not covered above|
By background, The List’s consumer publishing activities have by necessity been focused on events open to the general public, and given our remit to reach a broad audience, we have not focused on Category 4 events.
Given our strong commitment to a business model under which all listings are free, we also need to manage some cases of those promoting category 1 activities, who endeavour to position them as events and gain listings, when really promotion is what they seek for their commercial activities. But we have a soft spot for car boot sales, so Category 1 are certainly accepted.
As the delivery of listings via our api, expands through a widening set of channels, including smart cities, hotel groups and transport operators, and well beyond our earlier client base dominated by the media sector, then the value in including events from all categories above and especially Category 4 increases.
We are fortunate to have had a 30+ year history of gathering, enriching, managing, organising and distributing live events listings in a standard event structure, but we remain very open to adding to our understanding of this data space.