Automating Royalty Workflows with Stage: Part 1 – Data Ingestion for Music Royalties Pipeline

Nerijus Jankauskas
Since 2018
57.1 trillion
streams/loads processed
89 billion
song plays tracked
52 media outlets
data collection

Helmes helped London-based Stage to automate complex royalty workflows, starting with usage ingestion from an ever-growing set of the music industry’s data sources – the foundation for accurate and timely payments to authors.

Executive summary

Stage is a London-based company on a mission to fix the broken royalty pipelines in the music industry. 

Since 2018, we have helped the company build a data platform that uses the latest technologies to exchange and process music data and provide fair pay to authors.

As the first step in the process, we created a flexible ingestion framework that automates the ingestion of fragmented music usage data from an ever-growing set of sources like YouTube, Spotify, iTunes, Deezer, Pathe, and radio logs.

The platform supports both automated pipelines via APIs and manual uploads, scaling effortlessly with usage growth. Once data is retrieved, it is tagged, validated, organized, and displayed via a user interface where analysts can review and correct it if necessary.

The solution reduced time spent on data cleaning and organization by 60–80%, while human errors were minimized through automated validations. As a result, publishers and artists could receive more timely and accurate payouts.

Meet the customer

Company
Stage Enterprises

Stage Enterprises is a London-based company on a mission to change the lives of music makers, labels, publishers and entire communities by fixing the broken royalty pipelines in the music industry.

In 2018, the company partnered with Helmes (under the Telesoftas brand) to build a music data platform that utilizes the latest technologies to exchange and process music data and provide fair pay to authors.

Together, we tackled the music industry’s biggest data challenges by building a robust data ingestion pipeline that handles an ever-growing range of data sources and scales seamlessly with usage growth.

The challenge

Music societies and publishers werre facing increasing volumes of fragmented and inconsistent music usage data from various sources like DSPs (e.g., YouTube, Spotify, Apple Music), and radio logs. 

The data arrived in different formats and at varying intervals, requiring manual labor to standardize, validate, and organize it before it could be used for royalty matching and payments. The lack of automation created inefficiencies, errors, and delays in royalty distribution. In certain cases – especially with online DSPs – it was impossible to review usage without automation or abstraction.

The solution

We helped Stage create a flexible, scalable ingestion framework that supports both automated pipelines and manual uploads. 

The solution automates the ingestion of music usage data from multiple sources – both via APIs and manual uploads. Once data is retrieved, it is:

  • Tagged using predefined metadata rules (e.g., platform, time period, usage type).
  • Validated against configurable schemas (e.g., required fields, date formats, numeric validations).
  • Organized into logical blocks based on user-defined dimensions like platform, quarter, or country.
  • Displayed via a user interface where analysts can review, correct, or approve incoming data.

Benefits

The solution provides multiple benefits to music societies, publishers, and authors.

Music societies can now ingest larger volumes of data with fewer resources, as time spent on data validation and organization was reduced by 60–80%.

Monthly ingestion time dropped from weeks to days, while human errors were minimized through automated validations.

Data transparency and review improved significantly, making the entire royalty pipeline more predictable and auditable.

As a result, publishers and artists could receive faster and more accurate payouts.

  • 57.1 trillion streams/downloads processed (+30.5T)
  • 89 billion song plays tracked (+42B)
  • Data from 52 media outlets processed (19 radio, 33 TV)

The process

We began solving the problem by mapping all usage data sources and understanding their technical and contractual differences. 

We communicated closely with end users to define key requirements for usability and oversight, and held several focus group workshops to identify tagging logic, validation rules, and user review processes

The whole process consisted of the following steps:

  • Analysis: Source mapping, rule definitions, data behavior analysis
  • Prototype: Ingestion logic with limited sources
  • Development: Scalable backend, UI for human review
  • Testing: Real-world ingestion cases, validation testing
  • Deployment & Support: Ongoing monitoring, tuning, and feedback loops

We used the following tech stack to build the solution:

  • Cloud Infrastructure: AWS services, including S3, CDK, Lambda, Step Functions, EC2, RDS/Aurora, DynamoDB, SQS, and SNS were leveraged for scalable and reliable cloud solutions. These AWS tools allowed for efficient data storage, orchestration, and processing.
  • Data Layer: For managing and analyzing data, Snowflake was used for its performance in big data processing and seamless integration. PostgreSQL was used for handling match-related data and ensuring data consistency.
  • Programming Language: TypeScript was chosen for its strong typing and compatibility with both frontend and backend development, enhancing the developer experience and reducing errors.
  • Frontend: React framework was used for building dynamic user interfaces, ensuring responsiveness and smooth user interactions.
  • API Layer: GraphQL was used for querying and manipulating data in a flexible and efficient manner, providing the front-end team with the exact data they need without over-fetching.
  • Data Abstraction Layer: Depot, as an alternative to DBT, was used for building and managing data transformations. It allowed for efficient workflows for creating, managing, and running data models.

Helmes’s expertise in building data pipelines

We contributed to the project a deep experience in complex data pipelines and strong domain knowledge in music rights management. Our cross-functional team combined technical, analytical, and UX skills, enabling us to deliver a solution that works for both machines and people. 

As a digitalization partner, we don’t offer a rigid, out-of-the-box product – instead, we co-create tailored solutions with our clients, balancing automation and human oversight. We also focus on extensibility – clients can evolve and scale their ingestion logic as the industry and regulations change.

Flexible data systems reduce manual workload, eliminate errors, and ensure auditable data flow in industries where granular usage data needs to be ingested, grouped and validated – such as streaming services, ad performance tracking, or telecom billing. If you are looking for a partner to address data or other digitalization challenges, we’d be happy to hear from you.
Nerijus Jankauskas
Partner

Interested to learn more? Please contact us for any inquiries, and you’re also most welcome to get in touch with Nerijus Jankauskas directly to take the discussion even further.

Get in touch

Get in touch

Nerijus Jankauskas
Partner
Contact us