Introduction
Last year I started an open source project just with the goal of filling the time during my weekends and during the useless and boring university lessons.
It was just a simple library for extracting data from each website you need just with some line of code. It's based on Langchain and you have to bring your own keys from any LLM provider. After a couple of weeks later the project blow up and it started to reach thousands of stars on Github, downloads.
Real people, Linkedin, Twitter and newsletters were all talking about it. So on top of all these infos I decided to create a company with a cofounder.
After a couple of months I decided to make found-raise and I really struggled with. But at the end we still did a pre seed round.
Right now the company has 4 members: Me (CTO), Lorenzo (CEO), Vikrant and Mohammad (both founding engineers).
The story nobody talks about it
Historic context
Open source started to become viral and viral with the raising of Google. They positioned themselves as champions of the open source movement, sponsoring projects and making their tools publicly available. But the reality is darker than you can imagine.
The real reason that big G started to do this is because using proprietary technology they would have vendor lock-in back in the days. By open sourcing Android, Chromium, and Kubernetes, they didn't just gain goodwill—they created massive ecosystems where their services became the natural choice. You're "free" to use their tools, but you're locked into Google's infrastructure anyway.
Another example is React/React Native, created by Meta. They didn't open source it to be nice to developers. They did it because every developer using React becomes familiar with Meta's philosophy and architecture. When these developers start companies or move into leadership, they naturally choose Meta's tools and services. It's brand loyalty disguised as community contribution.
When OSS is used for competitive advantage
Look at Hugging Face. They built the largest open repository of machine learning models, creating a network effect where researchers and engineers default to their platform. Now they have unmatched data about what models work, what configurations are popular, and real-time insights into AI research trends. They're not just hosting—they're observing the entire ML ecosystem's behavior.
Microsoft case study
Microsoft is the perfect case for this evolution. They started as a closed source operative system and productivity suite company, aggressively defending their intellectual property. Then came the Github acquisition in 2018. This was the turning point.
Now, Microsoft is the biggest company doing open source at scale. But here's the play: Github isn't just a repository for storing your code. From Microsoft's perspective, it's the world's largest dataset of human programming behavior.
Every commit, every pull request, every issue represent training data for ML models. Copilot generates billions in potential value, trained on billions of hours of public code. They essentially got the world to crowdsource their AI training dataset for free, while maintaining the platform.
Our Case: how we turned the OSS lib to a paying product
When our library went viral, we faced a classic problem: how do you monetize something that's free and public?
We realized that the open source wasn't the business, it was the gateway. Our library solved a specific problem in a cool way, and thousands of developers downloaded it. But downloading is not revenue.
The key insight was understanding that the open source library and our paid product are two completely different things that solve the same problem. The library is the free, self-hosted solution for developers who want to build and maintain it themselves. The product is our managed, hosted solution for companies who don't want that overhead.
They both do the same thing, but they're not competing—they're complementary. The open source library becomes the funnel. Developers try it, love it, use it on side projects. Then when they want to use it at work, at scale, or in production with SLAs and support, they realize they need something more robust than managing it themselves. That's where our product comes in.
This gives us incredible advantages: first, inbound lead generation—developers who already trust our code and understand the problem we're solving. Second, we're not asking them to learn something new or switch tools, just to use a different delivery model. Third, we get real-world feedback on what features developers actually need and how they use the library in the wild.
The beauty of this model is that improving the open source library IS our marketing. Every feature we add, every bug we fix, every integration we support it brings more people into the funnel who eventually become customers.
Conclusions
Here's what I learned:
If you are a big company, it's easier to create open stuff because you can afford the R&D upfront and you have existing revenue streams to fund it. Open source becomes a strategic moat—it locks developers into your ecosystem while you maintain control of the premium layer above it.
If you are a startup who just finished the first round, you have two real options:
- Use the open source as a funnel: Release something genuinely useful, build community trust, and convert users to paid customers. This requires patience and a clear monetization layer above your free offering. It's slower but builds lasting relationships with your users.
- Start as closed source and open source later: Bootstrap your business on proprietary code, reach product-market fit, and once you're profitable and established, open source non-core components to build goodwill and hire better engineers. This is faster to revenue but requires you to survive the early lean years.
We chose option 1. It was the harder path, but now our open source project is our best asset—not because it makes money directly, but because it continuously fills our sales funnel with qualified leads who already understand our vision.
The real lesson? Open source itself isn't the business model. It's either a strategic play for tech giants or a lead generation machine for startups. Understanding which one you are, and acting accordingly, is what separates success from burnout.
Related Articles
If you found this perspective helpful, you might also be interested in these related articles:
- Zero to Production: Building a Scraping Pipeline in Minutes - Learn how to rapidly build production-ready data extraction pipelines
- Why APIs are Becoming Obsolete in the AI Era - Discover how AI is transforming data access and why traditional APIs are losing relevance
- The Rise of Agents: Why Your Data Infrastructure is Obsolete - Explore how AI agents are reshaping enterprise data strategies
- Beyond Firecrawl: The Future of Web Scraping - Understand the evolution of web scraping technology and AI-powered alternatives
