UCAS Similarity Detection Service - guidance for applicants

This guide is designed to help applicants using the UCAS application system to understand the new Similarity Detection process that all personal statements received in support of an application are subject to. It has been widely broadcast that some example personal statements found on the internet have been used by applicants, in some cases word for word. The system, called Copycatch, is used to identify statements that show similarity, quantify the amount of potentially copied material and to report the findings in an easy to understand report. It is a tool designed to help the admissions staff at Higher Education Institutions (HEI's) who will decide what action, if any, to take regarding notified cases.

Research has shown that 95% of applicants using the UCAS application system did indeed write their own personal statements but the number who made use of other people's material was sufficient to justify the creation of the Similarity Detection service.

What the Similarity Detection Service does

When a personal statement is submitted it is uniquely identified so the results of the process are accurately reported for each applicant.

Each incoming personal statement is checked against a library of personal statements already in the Copycatch system, and a library of sample statements collected from a variety of web sites and other sources including paper publications. Each new personal statement is added to the library of statements already received after it has been processed.

Any statements showing a potential level of similarity of 10% or greater will be reviewed by members of the UCAS Similarity Detection Service. HEI's will be notified on a daily basis of any cases where there are reasonable grounds to suspect collusion. Applicants will also be notified that the UCAS Similarity Detection service has identified their personal statement as potentially plagiarised. The decision about what action, if any, to take regarding notified cases rests with the admissions tutors at individual HEI's.

The Copycatch process ignores 450 commonly used words that many applicants would reasonably use in their statements like 'and', 'so', 'with', 'football' and also ignores a selection of commonly used phrases including 'Duke of Edinburgh'.

Verification

Copycatch is a process that identifies sentences in a personal statement that are matched to other personal statements already held in the Copycatch system. Copycatch does not make decisions about the validity of the results. Indications of possible collusion are reviewed by trained staff who decide whether you and the institutions you are applying to need to be notified that similarity has been found. Ultimately it is the institutions you are applying to that decide on the validity of the results and what action, if any, to take.

Notification that a report has been sent to HEI's

If the Copycatch identifies a significant amount of potentially plagiarised material in your personal statement and the Verification staff decide to inform the HEI's you have applied to you will be notified by email. This email will include instructions explaining how you can view the output of the detection program by using UCAS Track. There will also be access to a frequently asked questions (FAQ) section that contains advice and guidance.

The report you will see is the same as the report sent to the institution. This report will display your personal statement and will be marked up to identify sentences that contain potential plagiarism. We use four colours to indicate significant matches with other statements and grey to show sentences which have not been found to match.

Inside matched sentences we use black to show which words in your sentence are different from the one matched with it by the program. We use underlined black to show that the word is related but not identical.

What the sentence colours mean
Red is used for the sentences from the most matched statement.
Blue is used for the next best match if there are least 3 sentences.
Pink is used for the third best match if there are at least another 3 sentences.
Brown is used where only one or two sentences have been identified from a library source.

Grey is used for sentences for which no match has been found in the indexes and for very short sentences which don't get checked.

Some examples

I grew up in a city near the sea and have always been fascinated by marine life.

If you had written this sentence and found it shown like this when you checked the notification report it would mean that it had also been exactly matched to a personal statement stored in the Copycatch library.

I grew up in a town near the sea and have always found marine life fascinating.

If you had written this sentence and found it reported like this it would mean that the word town was not in the matched sentence, nor was the word found. The word fascinating was not found as an exact match but is sufficiently similar to the equivalent word in the matched sentence to be identified by underlining. The blue colour also shows you that the match was found in the second most matched statement.

The dates on the matched personal statements

At the bottom of the marked up personal statement, the number of sentences matched to library or internet based sources is shown in the same colour as that used to markup the sentences. The date is merely indicative of how long this personal statement has been in the UCAS collection. It does not mean that this particular statement was the one used as the source for the current personal statement. Both may be taken from a source outside the library, or there may be other related files inside the library which have not been shown because there was no additional matched information.

The dates on the matching web sources

The number of web source sentences is shown in the same way, but here the date means either the date it was posted to the web site, if known, or the date when the web source was identified by UCAS. Again, it does not necessarily mean that the file was the actual source. As a feasibility study discovered, some web sources are very popular, and may appear on more than one web site, or have been used in a modified form in a personal statement within the UCAS collection.

Why the program works

A personal statement of 4000 characters will contain approximately 600 words, about half of which will be words like 'the', 'of', 'to', 'and', 'so' etc. These are the words that all applicants would be expected to use in the preparation of a personal statement. Copycatch eliminates 450 of this type of word for the matching process. The rest of the statement is made up from 'content' words that carry the messages you wish to communicate. Copycatch also ignores a number of these words that applicants would regularly use, for example 'Duke', 'Edinburgh', 'Football' etc.

Under normal circumstances, if two personal statements are selected at random and compared you would expect very little or no similarity. Most sentences written, about any subject, are significantly different.

This means that if Copycatch finds two sentences in different statements which have exactly the same words, it is very probable that one is a copy of the other or that both have been copied from a common source. This can happen if a quote, taken from a text, is included in an essay, but is very unlikely to occur in a personal statement. It is when Copycatch finds a number of identical or similar sentences in a personal statement and a file held in the library that a similarity report is generated.