New data from the open source reveals the story of a simple javascript function. One line of code was re-invented over 100 times and duplicated over 1,000 times across GitHub’s top 10,000 repositories. This is only a symptom of a much deeper problem.
Imagine every time you wanted to drive a car, you had to build new wheels. People would probably still be riding horses to work. Elegant, some might say, but a terrible waste of time and effort. New data shows this is exactly what is happening in 2017. If you are a developer, you might be reinventing the smallest of functionalities across repositories and microservices every day.
Code components are the fundamental building blocks of any application. they are the atomic building blocks of our technological future. Different functionalities can and should be reused across different applications, repositories, and projects. In practice, this rarely happens. Instead, people often re-invent or duplicate the same code over and over again.The overhead of creating and maintaining hundreds of tiny repositories and micro-packages simply isn’t practical.
To see how deep and how far the phenomenon goes, we took a deep look into the guts of the open source on GitHub.
The story of “isString”
A semantic code identification technology was used to take a deep look into the guts of the open source on GitHub. The top 10,000 Javascript repositories were analyzed. Our scanners were looking to see how many times people reinvented one simple functionality: checking if a variable is a string. Normally, this can be done with 1-4 lines of code. Here are the results:
This simple functionality had been written in more than 100 different ways across only 10K repositories. The top 10 implementations were duplicated over 1,000 times. Given that GitHub hosts 55 Million repositories, the same function was duplicated millions of times. Here are a few examples from top open source projects:
Although it is true that change is necessary for evolution, these numbers mean bad new for everyone, for two main reasons:
First, constantly reinventing small pieces of code takes time and effort. Not only is it wasteful, but it actually holds back innovation. Reinvention Competes for the same time and resources which could better have been invested in building new things.
Second, code duplications are bad. Trying to fix a bug duplicated across dozens of places is hard and takes large amounts of time, and is also likely to break stuff. The larger the code base and the more repositories you have, the worse it becomes.
Why is it happening
The obvious solution would be to make code components reusable across repositories. Much had been said about code reusability. Renown community members post about designing reusable pieces of code. Others debate and struggle to force small components into their own repositories and packages. Most agree, there are three major problems that prevent us from building an arsenal of hundreds of small reusable components:
- Creation Overhead: Creating a new repository and a package for every small component will take a lifetime. There is simply too much configuration overhead required to make this process practical at scale.
- Maintenance: maintaining dozens or hundreds of tiny repositories and packages is no joke and neither is modifying small packages going through multiple demanding steps every time (cloning, Linking, debugging etc.). This may very well end up taking more time and effort than it could save.
- Discoverability: packages are hard to find. No one can say for sure what’s really out there, or what to trust and use (we all remember the left-pad story). Organizing hundreds of micro-packages and quickly finding the right one to use is no easy task.
Bottom line is: very few people create and maintains such an arsenal of micro-packages.
Write code once, use it anywhere
So, how can we change things? A good place to start would be dealing with the three problems: making reusable components quick to create, simple to maintain and easy to find.
To do exactly that, a new open source project called Bit has been recently released to GitHub. But is a virtualized code component repository. It enables developers to build a set of reusable components and use them anywhere they are needed.
Bit solves all of the three problems mentioned above using a virtual repository called a “Scope.“ A Scope allows you to create and model components without the overhead we know today. DDeveloperscan then find and use them with a unique NLP based semantic search engine. Scopes are distributed, which adds similar advantages known from a distributed Git repository. They can be created anywhere, and even connected to create a distributed network. A contained and reusable environment helps each component run and build anywhere. Scopes also help when collaborating as a team.
And in conclusion…
Code duplications (or reinvention) are a serious problem, and the data drawn from GitHub shows how widespread it really is. This is happening mainly because there isn’t a practical alternative that makes it possible to create a growing set of reusable components. Open source projects such as Bit or others can help solve this problem, saving valuable time and effort.
Bit is language agnostic by design, and uses special drivers to work with different languages. In the not so distant future, we could all work with virtual code bases composing pieces of code together to build anything (as described in the Unix philosophy). Meanwhile, using Bit or finding new ways to reuse atomic components would be a good place to start.