OpenAI’s ChatGPT took the world by storm, amassing 100 million users in the first two months after its public launch. The continued interest in the tool has created a buzz among developers, especially those in the open source community. But this begs the question: How will ChatGPT impact open source software?
Amid the excitement and promise that ChatGPT holds, many wonder whether the open source community, just like any other organization, should fear or embrace the technology. Certain concerns for open source contributors and developers are sparking debate as questions surrounding the origin of generated source code surface, and the possible ethical and legal implications are discussed. While valid, developers should not fear or stray away from using ChatGPT but rather shift their focus to understanding how to embrace it for positive outcomes.
The Top Three Open Source ChatGPT Concerns
Collaboration between contributors is the cornerstone of open source projects, which some fear will be disrupted with the increased use of AI-based tools. However, the level of synergy within the community cannot be easily taken away or replaced by tools. Instead, ChatGPT or other AI-based tools, including GitHub Copilot, allow developers to produce code more quickly and more efficiently.
As developers increasingly use AI tooling to assist with new or enhanced code, project collaboration and oversight will help improve the AI-generated code.
Debunking and navigating the top concerns will be imperative to take full advantage of the technology’s promise and potential.
Validity. While the creation of code by ChatGPT has generated excitement among developers, critics argue that—without context—the validity of code can be questioned. Some open source developers worry that teams will begin to rely entirely on ChatGPT to generate code, but this concern fails to consider the humans involved in the process—today and far into the future.
Developers don’t take ChatGPT’s output as the final word; rather, they use it as a baseline and jumping-off point to streamline their code. In fact, code is rarely created from scratch anymore. Developers rely on others’ source code. For example, code from Stack Overflow, GitHub and thousands of open source libraries that are available on public registries like npm, Maven, Nuget and PyPI. Bringing ChatGPT into the fold will not significantly change how developers source their code, but will improve development velocity, saving valuable time and associated cost.
Training Data. Machine learning (ML) and deep learning (DL) model training must be fair, robust and explainable to avoid biases. If the data is wrong, the results will be as well. Garbage in, garbage out. Coming from ML/DL models, the code ChatGPT returns in response to prompts could raise concerns surrounding its accuracy. Like any other source (Stack Overflow, GitHub, etc.), ChatGPT’s code outputs are not guaranteed to be perfect, and developers must be mindful of this.
However, there are more benefits of ChatGPT-trained models concerning code outputs. ChatGPT can also explain new or existing code, it can also effectively provide unit tests for code, which will help with writing better software faster.
Ownership. Questions of ownership have surfaced regarding the use and distribution of the code that the AI tool generates. While the code that ChatGPT generates is the result of ML/DL inference from many sources, it’s the developer’s responsibility to use that code ethically and safely. It should be treated as any other public data or open source software, as not a final product and must be used in the context of your requirements. It’s also important to carefully review any code generated by ChatGPT and ensure that it doesn’t introduce vulnerabilities.
Similar to GitHub Copilot, ChatGPT is trained by millions of lines of open source software. Code posted in prompts is likely to make it to models, too. Unlike copyrighted pieces of art and writing, the code that ChatGPT outputs should not be considered in a final state and should not be used in a way that is subject to licensing restrictions or legal implications that have dominated the conversations around its use.
Ownership closely relates to the ethical dilemma of using AI-generated text, code and more. To address that, there are new tools that help detect whether AI generated the content and how much, which will be useful for educators concerned with students overusing the tool. This emerging area of checks and balances exemplifies the constantly evolving and improving technologies that allow humans, and in the case of open source, contributing developers to improve their craft and produce better open source software.
ChatGPT’s Impact on Open Source Talent
Discussions about whether ChatGPT will require new skills for existing positions or new jobs with specialized experts are ongoing. While new and exciting, ChatGPT won’t immediately create new or different jobs. Like any new tool introduced to developers, it takes time for developers to become familiar with the technology and understand how to use it best. ChatGPT is no different. For example, consider the previous push to use low-code/no-code technology. While this great technology is used to speed up the creation of apps and improve the usability for non-developers, it has taken time for organizations to properly use low-code/no-code technology. A similar perspective and trajectory will extend to the use of ChatGPT for all software development, including open source projects.
In the coming weeks and months, it will be essential to encourage the open source community to embrace ChatGPT and explore its possibilities. Technology has already proven to be an effective educational tool. Consider asking ChatGPT for book recommendations about programming languages and coding; it delivers short descriptions for each book. Or, prompt it for the top takeaways from one specific book. By doing this, individuals can make learning so much easier. By learning from other developers and sharing resources, including results from ChatGPT, the community roots of open source will keep thriving.
Human Involvement is Always Needed
The public’s reaction to ChatGPT may be novel, but the idea behind the tool’s relationship to open source is not. Analyzing how ChatGPT works within the open source community reveals that it will be useful because it enhances the capabilities of developers, it doesn’t replace them.
Whether it’s reviewing code, pair programming or learning from fellow developers, humans will not be replaced by generative AI, only enhanced. Leveraging the tool will decrease the time and effort needed to complete tasks, increasing developer quality and efficiency. The use of ChatGPT does not create new positions or overhaul current skill sets; it makes them better and empowers developers to expand their opportunities and enhance their abilities.
Soon, ChatGPT and other AI tools will be standard practice within the open source community—or until a new technology is created and shakes up software development once again.