Home New data on open source: Reinventing the wheel every day

New data on open source: Reinventing the wheel every day

New data from the open source reveals the story of a simple javascript function. One line of code was re-invented over 100 times and duplicated over 1,000 times across GitHub’s top 10,000 repositories. This is only a symptom of a much deeper problem.

Imagine every time you wanted to drive a car, you had to build new wheels. People would probably still be riding horses to work. Elegant, some might say, but a terrible waste of time and effort. New data shows this is exactly what is happening in 2017. If you are a developer, you might be reinventing the smallest of functionalities across repositories and microservices every day.

Code components are the fundamental building blocks of any application. they are the atomic building blocks of our technological future. Different functionalities can and should be reused across different applications, repositories, and projects. In practice, this rarely happens. Instead, people often re-invent or duplicate the same code over and over again.The overhead of creating and maintaining hundreds of tiny repositories and micro-packages simply isn’t practical.

To see how deep and how far the phenomenon goes, we took a deep look into the guts of the open source on GitHub.

The story of “isString”

A semantic code identification technology was used to take a deep look into the guts of the open source on GitHub. The top 10,000 Javascript repositories were analyzed. Our scanners were looking to see how many times people reinvented one simple functionality: checking if a variable is a string. Normally, this can be done with 1-4 lines of code. Here are the results:

Screen Shot 2017-03-06 at 4.49.28 PM

This simple functionality had been written in more than 100 different ways across only 10K repositories. The top 10 implementations were duplicated over 1,000 times. Given that GitHub hosts 55 Million repositories, the same function was duplicated millions of times. Here are a few examples from top open source projects:

Screen Shot 2017-03-07 at 2.37.18 PM

Although it is true that change is necessary for evolution, these numbers mean bad new for everyone, for two main reasons:

First, constantly reinventing small pieces of code takes time and effort. Not only is it wasteful, but it actually holds back innovation. Reinvention Competes for the same time and resources which could better have been invested in building new things.

Second, code duplications are bad. Trying to fix a bug duplicated across dozens of places is hard and takes large amounts of time, and is also likely to break stuff. The larger the code base and the more repositories you have, the worse it becomes.

Why is it happening

The obvious solution would be to make code components reusable across repositories. Much had been said about code reusability. Renown community members post about designing reusable pieces of code. Others debate and struggle to force small components into their own repositories and packages. Most agree, there are three major problems that prevent us from building an arsenal of hundreds of small reusable components:

  1. Creation Overhead: Creating a new repository and a package for every small component will take a lifetime. There is simply too much configuration overhead required to make this process practical at scale.
  2. Maintenance: maintaining dozens or hundreds of tiny repositories and packages is no joke and neither is modifying small packages going through multiple demanding steps every time (cloning, Linking, debugging etc.). This may very well end up taking more time and effort than it could save.
  3. Discoverability: packages are hard to find. No one can say for sure what’s really out there, or what to trust and use (we all remember the left-pad story). Organizing hundreds of micro-packages and quickly finding the right one to use is no easy task.

Bottom line is: very few people create and maintains such an arsenal of micro-packages.

Write code once, use it anywhere

So, how can we change things? A good place to start would be dealing with the three problems: making reusable components quick to create, simple to maintain and easy to find.

To do exactly that, a new open source project called Bit has been recently released to GitHub. But is a virtualized code component repository. It enables developers to build a set of reusable components and use them anywhere they are needed.

ezgif.com-optimize

In a way that might sound somewhat similar (although different) to what Docker did for VMs, Bit adds a virtualized level of abstraction. It allows developers to create reusable components with almost no overhead at all and use them as a dynamic API. This means using nothing but the code actually used in your application.

Bit solves all of the three problems mentioned above using a virtual repository called a “Scope. A Scope allows you to create and model components without the overhead we know today. DDeveloperscan then find and use them with a unique NLP based semantic search engine. Scopes are distributed, which adds similar advantages known from a distributed Git repository. They can be created anywhere, and even connected to create a distributed network. A contained and reusable environment helps each component run and build anywhere. Scopes also help when collaborating as a team.

And in conclusion…

Code duplications (or reinvention) are a serious problem, and the data drawn from GitHub shows how widespread it really is. This is happening mainly because there isn’t a practical alternative that makes it possible to create a growing set of reusable components.  Open source projects such as Bit or others can help solve this problem, saving valuable time and effort.

Bit is language agnostic by design, and uses special drivers to work with different languages. In the not so distant future, we could all work with virtual code bases composing pieces of code together to build anything (as described in the Unix philosophy). Meanwhile, using Bit or finding new ways to reuse atomic components would be a good place to start.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Joni Sar is on the www.bitsrc.io team, working to build great open-source things with and for the community. Feel free to get in touch.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.