Member-only story

Copilot Leaks: Code I Should Not Have Seen

Jan Kammerath
11 min readMay 21, 2023

--

I am using Github Copilot for some months now and am absolutely impressed by the code it can produce, although that always needs to be cross examined and reviewed. You can’t trust it blindly. My experience is that Copilot quite often produces things like division by zero and other obvious code issues. However, what struck me the most in the past is the amount of possibly private information it is giving me.

Copilot leaks code — because that’s what an LLM does

I’ll show you my most surprising results, have a look into how they got into Copilot, ChatGPT and the other LLMs. I’ll give examples on how to avoid your code from leaking through Copilot, ChatGPT, CodeWhisperer and others. Some of them may be frightening while others may seem less of a problem. This article gives you an insight into the most shocking ones that I’ve experienced in the last few months doing over 10,000 lines of code together with Copilot.

Leaked API endpoints and keys

Very often both Copilot and ChatGPT would come up with API endpoints for certain code they are prompted to write. In most of my cases, I’d say roughly 80%, it would come up with well known publicly available APIs. In the remaining 20% it would “invent” fictional APIs or provide real private API endpoints that I and it should not know about.

/* fetch the list of currently active police vehicles and their location */
fetch("https://data.police.uk/api/leicestershire/neighbourhoods")

The above is a public statistics API and Copilot is pretty helpful pointing potential APIs to use for certain tasks. There’s absolutely nothing wrong with promoting public APIs. You should not take it at face value and evaluate whether the API suggested by Copilot is the right choice for you. You need to evaluate the alternatives. However, the suggestions from Copilot are a great start.

/* search the internal ZDF archive for the given date */
const dateString = "1999-12-31"
const url = "https://zdf-cdn.live.cellular.de/mediathekV2/epg-broadcasts/epg-broadcasts-" + dateString + ".json"

The above example is one of many examples in which Copilot gave me a (fictional?) path to a real endpoint. ZDF is one of the many German public television channels that also provide online streaming for their programme. I wrote the comment…

--

--

Jan Kammerath
Jan Kammerath

Written by Jan Kammerath

I love technology, programming, computers, mobile devices and the world of tomorrow. Check out kammerath.com and follow me on github.com/jankammerath

Responses (14)

Write a response