Lack of Compute-Heavy Infrastructure Required to Adopt AI Meaningfully - Ritvvij Parrikh Lack of Compute-Heavy Infrastructure Required to Adopt AI Meaningfully | Ritvvij Parrikh Humane ClubMade in Humane Club
table of contents

Lack of Compute-Heavy Infrastructure Required to Adopt AI Meaningfully

Many media companies, including some larger ones, lack the necessary compute-heavy infrastructure to leverage AI for audience engagement effectively (demand-side).

Why it matters: Without investing in audience-facing AI, you can’t build competitive differentiation and kickstart growth.

Assumed Audience: These working notes are for forward-thinking digital media business leaders eager to leverage artificial intelligence. Read about the founding thought on Media Platform Flywheel.

Here’s an interesting letter on X.com on how any AI without clean data is a waste of time.


Where it fits in the Media Platform Flywheel?

Companies without compute-heavy infrastructure can still use AI on the supply side, like using OpenAI APIs to generate content within their CMS. However, this usage mainly leads to cost optimization rather than growth.


Who is likely well-placed: Digital media companies often adopt best practices from offline businesses, such as divisional organizational structures, reliance on business rules, and editorial control of distribution.

  • Companies that operate like technology companies, such as Google News and Bloomberg, are unlikely to face these challenges. Google News is a BigTech product, and Bloomberg is a FinTech product with a news arm. The content and data published by Bloomberg are integrated into stock trading algorithms, necessitating robust data engineering.
  • For others, implementing compute-heavy infrastructure requires significant commitment to technical development, financial investment, and cultural change management. Typically, this transformation takes 2-5 years and often requires significant rebuilding.


14-question cheatsheet to evaluate if your newsroom has compute-heavy infrastructure

Front-end

1. Do you have a simple front-end interface? Algorithm-driven platforms like search and social media feature simple, straightforward lists to collect unbiased and comprehensive clickstream data about audiences and their interactions with content and ads. Does your interface and data collection method align with this approach?

2. User identification across your entire media portfolio? Google uses a single login across all its products and millions of other websites, allowing it to collect first-party data. Have you implemented a similar Single Sign-On (SSO) framework?

3. Can your front-end serve a personalized experience? Most news firms cannot. Their websites are cached on global CDNs, and less than 1% of their traffic reaches their servers. In contrast, algorithm-driven platforms can control the user experience for each individual user.

4. What percentage of your traffic is direct and logged-in? If most of your traffic comes through SEO, then you are unlikely to personalize at scale.

Data Engineering

5. Do you operate data pipelines? Industries like FinTech build and operate robust data pipelines for mission-critical or revenue-critical operations. Digital news firms typically rely on free tools like Google Analytics, or paid ones like Chartbeat, Smartocto, or Clevertap.

6. How real-time are these pipelines? Few firms invest in real-time data pipelines. The most accurate version of today’s Google Analytics data via BigQuery is available the next morning, missing intra-day optimizations.

7. How does your real-time data pipeline handle failure? When the pipeline fails, do you receive a notification? Are there regular data checks to see if the pipeline has failed? Are there systems to reinstate data lost due to pipeline failure?

8. How comprehensive is your data warehouse? Along with clickstream data, do you have other readily accessible data? This includes CMS information, subscription data, and ad network data.

9. Cost vs. Backup: How far back in time does the team store data in the warehouse? Is there an option to move older data to lower-cost cold storage? Have you built summary tables for granular data? Do you have alerts and playback options for when the summarization process fails?

10. Is the data warehouse queryable? Are there sufficient servers for exploratory data analysis? Is it feasible to obtain results promptly, or does it take significant processing time?

Data Reliability

11. Do you have a data team that operates independently? An independent data team can consolidate technical nuances, but data collection should be a shared organizational priority.

  • Technical nuances include defining data dictionaries, data cleaning, data warehousing, data audits
  • Data collection is shared because business or domain-level nuances are known by editors, product managers, business managers, etc. — people who run the business.

12. Does your team report numbers through automated dashboards or in spreadsheets? While digital newsrooms rely on systems like Google Analytics, there are often teams manually cleaning and presenting data, usually aggregated over long periods.

  • These teams manually clean and massage data for reporting, generally presenting blended ratios aggregated over vast periods of time.
  • Hence, even though you might be taking decisions through data-driven reports, this data isn’t ready for AI because it regularly needs cleaning and massaging.

13. How do you know the data is accurate? Do you have automated data checks running on the warehouse? If checks fail, do you receive alerts?

Democratized

14. How many of the previous aspects are democratized across the newsroom? Are editors and product managers able to check dashboards and assess data quality issues themselves?