Doin' DevOps

Universal Code Search Combinations

More potential searches than there are stars in the universe.

Source code search is not like searching for text on the webit is more complex and has many more nuances. When we set out to create a Universal Code Search engine, we knew there were many permutations that we had to account for—so many. So we began speculating over how many search permutations we would have to account for.

First, let’s cover what Universal Code Search is. Search today is not just “Grep and go.” The modern era of big code requires code search to cover many dimensions: repositories, file formats, programming languages, etc. Universal Code Search ties together information from many tools, from repositories on your code host to dependency relationships among your projects and application runtime information, to give developers a single place to quickly explore and better understand all code faster.

A Universal Code Search must satisfy the following requirements:

  • Search through repositories on any major code host.
  • Search across all commits, making both the current (up-to-date) and past codebases discoverable.
  • Support all programming languages.
  • Support code intelligence for all major languages.

To quantify how expansive Universal Code Search is, the best metric is the number of search combinations. 

Developers can search across more than 18 dimensions of code to navigate, explore and understand code.

Repositories

Sourcegraph can search across any repository, spanning all Git providers in addition to other source code version control systems. GitHub hosts over 100 million repositories, so using a 17.31% market share, we can estimate there are 577,700,751 searchable repositories. 

Commits

Sourcegraph can search any repository, and any commit within the repository. For instance, if the average repository has five developers contributing five commits for 300 days per year over five years, Sourcegraph would be able to search all 37,500 commits using repo:<REPONAME>@<COMMIT>

Branches

As with commits, Sourcegraph can search all of a repository’s branches. Using the same methodology, we can assume that each of the five developers works on five feature branches per month over five years. This would lead to 1,500 branches on the repository that could be searched using repo:<REPONAME>@<BRANCHNAME>.

File Type

Sourcegraph supports all programming languages and developer files, in addition to other text files. Between text and developer files, we conservatively estimate this number to be 1,155 total file types supported.

Pattern Type

Sourcegraph allows your search to be interpreted three ways: literally, as a regular expression (regex) or as a structural search pattern. Structural search lets you match richer syntax patterns specifically in code and structured data formats like JSON.

Search Type

Four search types are permitted. Whether you need to search over all your code at a given point in time, changes to code or commit messages (diffs or commits), Sourcegraph gives you access to all depths, past and present, of your codebase. *file* *repo*

Repo Forks and Archives

Include repository forks in your search, or don’t. Include repository archives, or don’t. These are the four options for searching forks and archives among your codebase in Sourcegraph search. *yes, no, only*

Case

Sourcegraph search supports both case insensitive and sensitive search. And when performing a diff or commit search type, there are five more filters you can use.

Developer

Search for the author of the code, or for the committer. These are the two options when performing a diff or commit search.

Time

Sourcegraph supports searching both before and after a time frame, such as before: “last Thursday,” after: “6 weeks ago” or after: “November 1, 2019.”

Message

You can also search diff and commit searches for which the commit message contains a particular string.

IDE

Four IDE integrations are supported: VS Code, Atom, IntelliJ and Sublime Text. This helps improve developer productivity by reducing context-switching. The limitations on Sourcegraph’s IDE integrations are limited in performing structural searches, use filters and search past commits. Therefore, most of the dimensions described above don’t apply to IDEs.

What does this add up to? 2,702,343,531,371,460,001,448 potential code searches across all repos, all programming languages, all code changes, all file formats and all code hosts. Universal Code Search makes it easier and faster to explore, navigate and better understand all code, everywhere. Don’t believe it? Go ahead and see for yourself. Change the assumptions to match your organization.

Quinn Slack

Quinn Slack

Quinn Slack is CEO and co-founder of Sourcegraph. Prior to Sourcegraph, Quinn co-founded Blend Labs, an enterprise technology company with over 500 employees dedicated to improving home lending. At Palantir Technologies, he created a technology platform to help two of the top five U.S. banks recover from the housing crisis. He was the first employee and developer at Bleacher Report after graduating from high school. Quinn graduated with a B.S. in computer science from Stanford.

Recent Posts

Building an Open Source Observability Platform

By investing in open source frameworks and LGTM tools, SRE teams can effectively monitor their apps and gain insights into…

23 hours ago

To Devin or Not to Devin?

Cognition Labs' Devin is creating a lot of buzz in the industry, but John Willis urges organizations to proceed with…

24 hours ago

Survey Surfaces Substantial Platform Engineering Gains

While most app developers work for organizations that have platform teams, there isn't much consistency regarding where that team reports.

2 days ago

EP 43: DevOps Building Blocks Part 6 – Day 2 DevOps, Operations and SRE

Day Two DevOps is a phase in the SDLC that focuses on enhancing, optimizing and continuously improving the software development…

2 days ago

Survey Surfaces Lack of Significant Observability Progress

A global survey of 500 IT professionals suggests organizations are not making a lot of progress in their ability to…

2 days ago

EP 42: DevOps Building Blocks Part 5: Flow, Bottlenecks and Continuous Improvement

In part five of this series, hosts Alan Shimel and Mitch Ashley are joined by Bryan Cole (Tricentis), Ixchel Ruiz…

2 days ago