GitHub Copilot and the unfulfilled promise of artificial intelligence in the future

Late June 2021, GitHub roll out What they call a “technology preview” GitHub co-pilot, Described as “an AI pairing programmer who helps you write better code.” It is foreseeable that the response to this announcement has varied, from being pleased with the glorious arrival of our code-generation AI overlord, to frustration and predictions of doom, as the company will collectively fire software developers soon.

As is usually the case with such controversial topics, these extreme situations are far from the truth. In fact, the OpenAI Codex machine learning model, which is the basis of GitHub Copilot, is derived from OpenAI’s GPT-3 natural language model, and has many of the same mistakes and mistakes as GTP-3. So if Codex and Copilot are not everything it describes, then what’s the big deal, why show it?

Many definitions of artificial intelligence

Baker Library at Darsmouth College. (Source: Gavin Huang, CC BY 3.0)

The first major attempt to establish a real artificial intelligence field was Dartmouth Workshop In 1956.This will see some of the most important minds in the fields of mathematics, neuroscience, and computer science come together to fundamentally brainstorm to create what they call “artificial intelligence”, following the more common name “thinking machine” at the time and Automata theory.

Despite the hopeful attitudes in the 1950s and 1960s, people quickly recognized that artificial intelligence was a much more difficult problem than initially assumed. Today, artificial intelligence that can think like humans is called artificial general intelligence (General Electric) And is still the realm of science fiction. Most of what we call “artificial intelligence” today is actually artificial intelligence in a narrow sense (ANI, Or narrow artificial intelligence) and includes technologies close to all aspects of AGI, but their scope and applications are usually very limited.

Most ANI is based on artificial neural networks (Artificial neural networks) Roughly replicates the concepts behind biological neural networks, such as those found in the neocortex of mammals, despite major differences and simplifications.ANNs like Classic NN and Recurrent Neural Network (RNN)-used for GPT-3 and Codex-used during training Backpropagation, This is a process without biological analogues.

Essentially, RNN-based models like GPT-3 are curve fitting models, which use regression analysis In order to match a given input with its internal data points, the latter is encoded in the weights assigned to the connections within its network. This enables the neural network to effectively find possible matches in its parameter network in its core mathematical model. When it comes to GPT-3 and similar natural language synthesis systems, their output is therefore based on probability rather than understanding. Therefore, like any ANN, the quality of this output is highly dependent on the training data set.

Garbage in garbage out

The historic Pioneer Building in San Francisco is home to OpenAI and Neuralink. (Source: HaeB, CC BY-SA 4.0)

All of this means that ANN cannot think or reason, and therefore cannot realize the meaning of the text it generates. As far as OpenAI’s Codex is concerned, it does not know what code it has written.This leads to the inevitable need to manually check the work of the ANN, as in A recent paper published by OpenAI (Mark Chen et al., 2021). Although Codex is trained in code rather than natural language, it has as little concept of working code as correct English grammar or essay writing.

This is confirmed FAQ The same is true on the Copilot page on GitHub, which states that the first attempt to fill in the blank function code was correct only 43% of the time, and the correct rate was 57% in 10 attempts. Mark Chen et al. The Python output generated by Codex was tested against the prepared unit test. They showed that for various inputs, different versions of Codex managed to generate the correct code in significantly less than half of the time. These inputs range from interview questions to docstring descriptions.

In addition, Chen et al. Please note that because Codex does not know the meaning of the code, it cannot guarantee that the generated code will run, function correctly, and contain no security or other defects. Considering that Codex’s training set is composed of several gigabytes of code from GitHub, there is no comprehensive verification of correctness, functionality or security issues, which means that regardless of the results of the regression analysis, it can be guaranteed to be as correct as the code from a fuzzy correlation. Copy of StackOverflow post.

Let’s look at the code

What you need to pay attention to when using GitHub Copilot is that OpenAI’s Codex is based on GPT-3, It is also an exclusive license of Microsoft, which also explains its association with GitHub and why it needs to use the Visual Studio Code IDE at least in the current technology preview stage.After installation GitHub Copilot extension Log in in VSC, and your code will be sent to the Microsoft data center where Codex runs for analysis and recommendations.

Copilot will automatically provide any code suggestions without the user’s explicit input. All it needs is some comments describing the function of the code that should be followed, and possibly a function signature. When the system finds it has contributed, it will display these options and allow the user to select them.

Unfortunately, Copilot’s technical preview is only available to a very limited number of people, so in the initial Zerg Rob After the announcement, I have not been able to gain access. Fortunately, some people who have gained access have already written down their thoughts.

A TypeScript developer (Simona Winnekes) wrote their thoughts after using Copilot Create a minimal quiz app In TypeScript and Chakra. After describing the intent of the code part in the comments, Copilot will suggest the code, which first involves letting Copilot actually use Chakra UI as a dependency. Checking Copilot’s suggestions usually reveals errors or incorrect code. Solve these problems by writing clearer instructions in the comments and choosing the expected options from Copilot’s suggestions.

Simona’s discovery is that although Copilot can use JavaScript, Python, and TypeScript, and can help when writing repetitive code or unit tests, the generated code requires constant verification, and Copilot often refuses to use required modules and dependencies. The generated code also has an obvious “stitched together” feeling, lacking the consistency that human developers expect. In the end, it took Simona about 15 minutes to write this quiz by hand, and there were two hours left, while humoring this Copilot AI partner. After this experience, the enthusiasm for continuing to use Copilot is understandable.

At Scott Logic, Colin Eberhardt has a Very complicated experience With the co-pilot. Although he admitted some “wow” moments, in which Copilot was indeed a bit useful or even impressive, but in the end it was negative. His complaint focused on the delay between input and Copilot’s pop-up suggestions. This, together with the “auto-completion” model used by Copilot, leads to a kind of “workflow”, similar to a pair of programming agencies, who seem to randomly rip your keyboard off you to enter something.

Colin’s experience is that when Copilot insists on suggesting 2-3 lines of code, it is acceptable to verify that the cognitive load suggested by Copilot is acceptable. However, when suggesting a larger code block, he felt that the cost of validating Copilot’s recommendations was not more worthwhile than typing the code himself. Nevertheless, he saw the potential of Copilot, especially when it became a true AI partner programming partner.

This The most comprehensive analysis Probably from Jeremy Howard of Fast.ai. In a blog post titled “Is GitHub Copilot a blessing or a curse?” Jeremy keenly observed that most of the time is not used to write code, but to design, debug, and maintain code. This led to the “curse” part, because Copilot’s (Python) code became quite verbose. What happens to the code design and architecture (not to mention easy maintenance) when the code is mainly generated by Copilot and kin?

When Jeremy asked Copilot to generate code to fine-tune the PyTorch model, the generated code did work, but the speed was very slow, resulting in poor tuning results. This leads to another problem for Copilot: how do you know that the solution provided is the best solution for a given problem? While browsing StackOverflow and programming forums and blogs, you may stumble upon various possible methods and advantages and disadvantages.

Since the code generated by Copilot has not been considered in this way, what is the true value of the generated code in addition to passing the (auto-generated) unit test?

Evolution, not revolution

Jeremy also pointed out that Copilot is not as revolutionary as it claims.Over the years, there have been many options, such as GitHub Semantic code search, Tabinen Use an “AI assistant” that can handle multiple languages (including non-scripting languages). Earlier this year, Microsoft released IntelliCode for Visual Studio. Common patterns here? AI-based code completion.

Microsoft’s Visual Studio IntelliCode “AI assisted development” example.

Since GitHub’s Copilot has so much competition, it is more important than ever to be aware of its position in the development process and how to adjust it to suit different development styles. Most importantly, we need to get rid of the bubbling and starry concept of “AI to programmer partners”. Obviously, these are more akin to ambitious auto-completion algorithms, with all the advantages and disadvantages.

Some developers like to turn on all auto-completion features in the IDE, from parentheses to function and class names, so that they can actually press Enter to generate half of the code, while others like to painstakingly put each character next to the screen fill The chiseled file contains documentation and API references. Obviously, Copilot will not win this different type of developer.

Perhaps the most important argument against Copilot and Kin is that these algorithms are just clumsy algorithms, with zero consideration for the code they generate. Since human developers always have to verify the generated code, the age of StackOverflow et al. seems to have passed. It has not been fully numbered, and the work of software developers is still very safe.