Executive summary
After establishing a scalable ingestion framework for music usage data, Stage needed to solve the next major challenge in the royalty workflow: accurately matching usage records from streaming platforms, broadcasters, and other sources against registered musical works.
The scale and complexity of the problem were significant. Music societies and publishers processed batches containing up to 290 million usage records while working with catalogues of more than 42 million registered works. Inconsistent metadata, missing identifiers, spelling variations, and differing naming conventions made reliable automated matching difficult, while manual review created major operational bottlenecks.
Helmes helped Stage build a scalable matching platform that combines automation with analyst oversight. The solution standardises incoming metadata, generates likely match candidates using multiple matching strategies, scores potential matches using configurable business rules, and routes uncertain cases to analysts through a dedicated review interface.
The platform automated more than 100 million match operations in under three hours, achieved 97.7% match accuracy, and reduced manual review workloads by over 90%. Tasks that previously required days or weeks of manual effort could now be completed within hours, helping music societies and publishers process royalty data faster, reduce unmatched usage records, and improve the accuracy and completeness of royalty distributions.
Beyond operational efficiency, the solution gave Stage and its clients a more transparent and adaptable matching process, allowing matching rules and scoring logic to evolve alongside changing data patterns and business requirements.
Meet the customer

Stage Enterprises is a London-based company on a mission to change the lives of music makers, labels, publishers and entire communities by fixing the broken royalty pipelines in the music industry.
In 2018, the company partnered with Helmes (under the Telesoftas brand) to build a music data platform that utilizes the latest technologies to exchange and process music data and provide fair pay to authors.
Together, we tackled the music industry’s biggest data challenges by building a robust data ingestion pipeline that handles an ever-growing range of data sources and scales seamlessly with usage growth.
The challenge
Music societies and publishers needed to match huge batches of music usage data — sometimes containing up to 290 million tracks — against catalogues with more than 42 million registered musical works.
The scale alone (up to 1.2 × 10¹⁶) made brute-force matching impossible.
At the same time, data quality issues — such as inconsistent formatting, spelling variations, and missing identifiers — made accurate automated matching extremely difficult.
As a result, unmatched usage data often led to delayed or lost royalty payments for authors.
Our approach
To solve the problem, we first analysed both usage and works data to identify reliable matching signals, such as ISRCs, artist names, and recurring metadata patterns.
We then broke the problem into modular phases: data cleaning, candidate generation, probabilistic scoring, and human review.
The approach focused on improving matching accuracy and processing performance while still allowing users to validate uncertain matches and refine matching logic over time.
Our strategy emphasised precision, performance, and user validation, combining intelligent automation with fallback manual tools.
The whole development process included the following steps:
- Discovery: Analysis of existing match rates and bottlenecks
- Data audit: Identifying formats, errors, and inconsistencies
- Prototype: Building a scoring engine with limited samples
- Implementation: Scalable matching framework, integrated UI for human review
- Testing and tuning: Real usage datasets, precision/recall tuning
- Deployment: Cloud-based rollout and operational handover
The solution
The solution enables music societies process far larger volumes of music activity in less time, accelerating royalty processing and reducing unmatched records.
The platform performs high-volume matching of track usage data against a large musical works database through a multi-stage process:
1. Data extraction and normalisation
Incoming track information is cleaned and standardised to improve consistency across sources.
This includes formatting titles and artist names consistently, removing unnecessary punctuation, handling featured artists, and accounting for common naming variations.
2. Identifying likely matches
The platform narrows down potential matches using several comparison methods, including exact and fuzzy ISRC matching, track titles, artist names, migration tables, and related metadata.
3. Match scoring and validation
Each potential match is evaluated using configurable scoring rules that take into account factors such as title similarity, recording identifiers, and co-writer information.
High-confidence matches can be approved automatically, while uncertain cases are flagged for analyst review.
4. Analyst review and approval
Analysts can review lower-confidence matches through a dedicated interface, compare metadata, and confirm, reject, or adjust results when needed.
The results
The platform automated 100 million match operations in under three hours while maintaining high match accuracy and reducing manual review workloads by more than 90%. Its flexible scoring model also allows matching rules to evolve over time as new data patterns and business requirements emerge.
The solution significantly improved both match rates and processing speed for music societies. Tasks that previously took days or weeks of manual work could now be completed within hours, while the volume of unmatched records dropped dramatically.
As a result, teams were able to spend less time processing routine matches and more time resolving exceptions and handling complex cases. At the same time, songwriters and rights holders benefited from faster and more complete royalty payments.
Technologies
AWS was used to provide scalable cloud infrastructure for processing large volumes of music usage and catalogue data efficiently.Snowflake enabled fast querying and transformation of large datasets and supported the complex matching and scoring calculations required by the platform.
Building data pipelines
We helped Stage design a solution that balanced automation with control, achieving high performance without sacrificing accuracy. Our team’s deep understanding of music metadata, matching logic, and high-scale data engineering allowed us to solve a problem where many off-the-shelf tools would fail.
As the client’s digitalisation partner, we don’t just implement generic matching engines. We tailor our solutions to music industry realities: ISRC inconsistencies, performer aliases, local catalogue quirks, and co-ownership models. Additionally, we empower users with transparent tools and continuous improvement loops, not black boxes.
Work with Helmes
Helmes helps organisations automate complex data workflows where large-scale matching, validation, and processing are critical to operations. Alongside the music industry, we have delivered custom software solutions for telecom, healthcare, finance, retail, mobility, and public-sector organisations across Europe.
If you are looking for a partner for data-intensive or digitalisation projects, we’d be happy to talk.
Get in touch
