Vipul's Razor v2 README

Vipul's Razor is a distributed, collaborative, spam detection and
filtering network. Through user contribution, Razor establishes a
distributed and constantly updating catalogue of spam in propagation that
is consulted by email clients to filter out known spam. Detection is done
with statistical and randomized signatures that efficiently spot mutating
spam content. User input is validated through reputation assignments based
on consensus on report and revoke assertions which in turn is used for
computing confidence values associated with individual signatures.

Vipul's Razor v2 agent software is available from project's homepage at
http://razor.sf.net. Razor Agents are written in Perl and will work on
most Unix operating systems and others OSes for which perl is available.
Installation and usage instructions can be found in the INSTALL document
in the distribution.

Vipul's Razor v2 is almost a complete rewrite of Razor v1. The following
is a list of the most significant new features:

 1 New Protocol

    The Razor v2 protocol has been completely redesigned. The new
    protocol is based on exchange of _Structured Information Strings_,
    that are similar to URIs and can be parsed with URI decoding
    libraries. v2 protocol supports _Pipelining_, which means Razor
    Agents can keep a connection open with server to eliminate the
    latency introduced by TCP 3-way handshake and 4-way breakdown for
    every connection. The new protocol semantics allow seamless
    introduction of new signature schemes.

 2 Ephemeral Signatures

    Ephemeral Signatures are short-lived signatures based on
    collaboratively computed random numbers. Ephemeral Signatures select a
    section of text from the spam message based on a random number that
    changes every so often. This makes the hashing scheme a moving target,
    and spammers can't exploit it because they don't know which part of
    the message will be hashed after the random number rollover.

 3 Preprocessors

    Razor v2 supports several preprocessors. Preprocessors alter the the
    text of a spam before a hash is computed. This version includes
    preprocessors to decode Base64 encoded messages, decode QP encoded
    messages and convert HTML to plaintext. Spammers employ several
    techniques that hide mutations in various encoding. Preprocessors
    defeat such techniques by hashing the content that a recipient
    actually sees in his/her mail user agent.

 4 Multiple Filteration Engines

    Razor v2 supports multiple engines. An engine is logical unit that
    encapsulates a particular type of filteration service. Razor v2
    currently supports four engines - VR1 which is equivalent to Razor v1,
    VR2 that is based on SHA1 signatures of bodytext, VR3 that is based on
    Nilsimsa signatures, and VR4 based on Ephemeral hashes. New engines
    can be seamlessly plugged into the service as and when required.

 5 Complete Backward Compatibility with Razor v1

    The VR1 engine is functionally equivalent to the Razor v1 service and
    uses the same database. This means users who transition from v1 to v2
    will still get the benefit of several million signatures known to the
    v1 service.

 6 Base64 signature encoding

    Signatures are now encoded as base 64 numbers instead of base 16
    (hex), reducing traffic that goes over the wire by 33%.

 7 Truth Evaluation System (TeS)

    Razor v2 has a transparent, back-end component known as TeS. TeS is a
    combination of a reputation system and pattern recognition heuristics
    that assigns trust to reporters and confidence values (between 0-100)
    to every signature. Users can set an acceptable confidence level in
    their Razor configuration. The server also publishes a recommended
    confidence level. TeS has been designed to eliminate false positives
    of legit bulk email that were occasionally generated by bad reports
    in Razor v1.

 8 Submission of entire spam messages

    Razor v2 accepts the entire body text of spam messages not previously
    known to the system. This lets Razor v2 compute new Ephemeral
    Signatures every n hours as well as seed the database whenever a new
    signature scheme and/or preprocessor is introduced. It should be noted
    that Razor v2 _does not_ accept contents of legit email during a check
    dialogue. Only signatures are sent when checking email.

 9 Revocation

    Razor v2 allows users to revoke messages that they don't consider to
    be spam. Revocation input is fed into TeS, that adjusts the confidence
    value of a signature or remove it from the database as necessary.
    Revocation is done through a tool called razor-revoke, which is a part
    of the new Razor distribution.

10 Reporter Registration

    Razor v2 requires reporters to be registered. This lets reporters
    build a reputation over time, so their reports and revocations are
    weighed according to their reputation value. Report requires users to
    authenticate which is done using a CRAM-SHA1 authentication scheme.

11 Content classes

    Razor v2 introduces the concept of content classes. A content class is
    a set of messages that represents variations on the same content. As
    new reports come in, Nomination servers associate them to an existing
    content class, if a (close) match is found. Additionally, Razor v2
    treats each MIME attachment is a separate content class, so spammers
    MIME attachment can be individually tracked (which is very useful in
    case of viruses).


              $Id: README,v 1.4 2005/06/28 22:19:07 jpr5 Exp $