To be honest, if the greek king Sisyphus would be a developer writing open source code in 2016, he would feel at home.
Sisyphus’s famous punishment, handed down by the gods, was to be forced to roll an immense boulder up a hill – only to watch it roll back down after reaching the top, repeating this action for eternity. Almost without noticing, the world’s developer community took upon itself a similar punishment over the past few years. But now the boulder keeps getting bigger.
The U.S. Library of Congress holds around 24 million catalogued books. By many measurement, it is the largest pool of written human knowledge ever created by mankind, the history of thousands of years.
In 2009, GitHub was founded. It now holds over 35 million libraries, or repositories, holding dozens of trillions of lines of code. Studies show that this amount is growing at an exponential rate and has doubled in size every 14 months or so. Open source code is without a doubt the cutting edge of today’s programming technology and is one of the greatest, most powerful and most advanced stockpiles of knowledge ever held by man. Shiny new frameworks are industry benchmarks and brilliant open source builders are rockstars.
So, how come 90%-98% of all open source code is thrown away after 12 months?
The code is in the details
Here are some startling numbers: a year after the day they were first written, over 90% of repos will never be touched and never be used again.
They become inactive and obsolete, forgotten in the sands of time. On its 2015 survey, Stack Overflow found that the average developer spends roughly seven hours a week programming outside of work. GitHub reports having over 12 million users working on open source projects. Humanity is throwing away and tossing aside millions of working hours spent by millions of bright people.
The crazy part? No one seems to asking “why?” Why is the vast majority of written open source code buried and forgotten? Why are we writing the same code over and over again every single day, when at the same time this code almost certainly exists somewhere around the open source – just waiting to be used?
It is happening mainly because people treat repositories exactly like that – as repositories. Everyone knows AngularJS, or JQuery or React, but very few people know more than ten open source packages. And that is the crazy part – because people don’t know of, or don’t use, the entire package, no one uses the code within it. A package written in 2015 might not be useful to someone on the whole, but perhaps it contains just the function needed for something else. The most useful parts are not always the entire packages, but sometimes the code pieces within them.
Let’s say someone is looking for a JavaScript function to shuffle elements inside an array, or a different function to create a random string of characters. Those small code pieces exists hundreds of times throughout the open source. But no one knows they exist, and even if they did, no one knows how to find them. So priceless amounts of valuable knowledge are discarded or forgotten, just because they are not accessible. This is insane, and it’s bad for everyone.
Organize all code and make it easily accessible
So do we solve this mess? Easy to answer, tough to do – you need to do three things:
- Organize all open source code by functional pieces: Functions, Libraries, etc.
- Build a model to represent what each and every one of those different pieces actually do, as in “what is their functionality?”
- Create an easy and simple way to search and find those pieces of code.
This is why we built Cocycles. It is all of the above, but is also a work in progress. Its algorithms process a huge amount of open source code, reading through it and understanding the functionality of every different line, function or other functional unit. It then allows people to search for that code using plain English.
In the example mentioned above, a user will only need to type “shuffle array” or “create random string” and then they’ll be presented with a variety of open source code implementations, documentations, usage examples and more. It will even offer to generate a useful snippet already containing all dependencies and sub-functions.
In the future, years from now, AI software might be able to use this to find and learn new code by themselves, allowing them to evolve, change and grow on their own. But right now, it currently only supports Javascript, and it is a free and open technology built to make sure the exponential growth in open source code is met by a growing ability to share and use that code.
The author is head of growth at Cocycles.