Real Time Foto Moderator

RTFM is a human-powered image moderation API.

Overview

RTFM provides a RESTful API that is designed to have resource-oriented URLs, use HTTP response codes to indicate errors and use built-in HTTP features like HTTP authentication. All responses will be returned in JSON.

The API endpoint is https://rtfm.crowdflower.com/v1/. All API requests must be made over HTTPS. Requests made over plain HTTP will fail. Authentication is required for all requests.

Current Version

1.2.0

Authentication

Authentication to the API occurs via Bearer token Authorization header or via simpler HTTP Basic Auth. To use the simpler version, pass your API key as the basic auth username. You do not need to provide a password.

API keys are managed in your account settings.

# For simplicity, curl uses the -u flag to pass basic auth credentials. Adding a colon after your
# API key will prevent curl from asking you for a password.

curl https://rtfm.crowdflower.com/v1/images \
  -u 7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0: \
  -d "url=http://example.com/test.jpg"

# Using HTTP headers is a more secure way to send your API key:

curl https://rtfm.crowdflower.com/v1/images \
  -d "url=http://example.com/test.jpg" \
  -H "Authorization: Bearer 7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0"

Testing

RTFM provides a Test subscription type. Select your Test subscription and then get the test key from the API Settings page. All documented API calls work with your test key.

Images posted using this key are not sent to moderators in the crowd—they’re moderated based on substrings in the posted url parameter:

SubstringExample URLModeration Result
approvedhttp://example.com/approved.jpg{ "score": 1.0, "rating": approved }
rejectedhttp://example.com/rejected.jpg{ "score": 0.1, "rating": rejected }
neitherhttp://example.com/neither.jpgrandom result, either of the above

The substring can be anywhere in your domain, filename or query string. A webhook will be posted to your webhook URL within several minutes.

Errors

RTFM uses HTTP response codes to indicate success or failure of a request. Codes in the 2xx range indicate success, codes in the 4xx range indicate a request error, and codes in the 5xx range indicate an error with RTFM’s servers.

  • 200 OK - everything worked
  • 401 Authentication required - invalid API key provided
  • 402 Payment required - insufficient funds on your account
  • 403 Unauthorized - insufficient privileges to access
  • 404 Not found - the requested item doesn’t exist
  • 406 Not acceptable - cannot respond with the requested format (API is JSON only)
  • 422 Unprocessable entity - invalid submitted data

Rate limit

RTFM allows up to 10 requests per second from the same IP address. If you exceed this limit, you’ll get a 503 Service Unavailable response for subsequent requests.

Images

Moderators screen images based on an extensive ruleset. See Moderation for the list of criteria.

Score

A confidence score, between 0 and 1, calculated from moderator responses. 0 is rejected, 1 is approved.

Rating

  • rejected: score less than 0.5
  • approved: score of 0.5 or greater

Note: If the subscription requires less cf_strict ruleset, “borderline” images will be approved.

States

  • processing: image is being moderated
  • on hold: account has insufficient funds
  • completed: image has completed moderation

Rejection reasons

The reasons are returned only for an API subscription that enables reason collection.

  • sexual_content: Sexual content
  • crotch_shot: Crotch or pelvic shot/content
  • nudity: Nudity
  • inappropriate_dress: Inappropriate dress
  • violent_or_illegal: Violent or illegal material
  • ads: Ads or copyrighted material
  • borderline: Borderline
  • none: None of the above

Rejection rulesets

RTFM lets you choose between two different rulesets: a default strict (cf_strict) ruleset and a more lenient ruleset (cf_standard). The selection of the ruleset is available in the API settings. When cf_standard ruleset is selected, borderline images will be approved.

Posting

Image URLs must be publicly accessible. A webhook is sent when moderation for an image is completed.

An optional metadata hash parameter can be posted with your image. You can use this parameter for arbitrary data, e.g. an internal primary key. We will return this metadata with your image data when we send a webhook or if you make API call, requesting that image.

POST /images uploads a image:

# parameters:
#
# "url": string
# "metadata" (optional): hash

curl https://rtfm.crowdflower.com/v1/images \
  -u 7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0: \
  -d "url=http://example.com/test.jpg" \
  -d "metadata[internal_id]=Aj39x" \
  -d "metadata[internal_status]=spam"

You can choose between rails-style form-encoded format(above) and posting a top level image JSON object with url and metadata fields:

curl "http://rtfm.crowdflower.com/v1/images" \
  -d '{image:{url: "http://example.com/test.jpg", metadata:<OPTIONAL VALID JSON>}}'
  -u '7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0:'

Response:

# attributes:
#
# image: hash
#   "id": string
#   "url": string
#   "metadata" (optional): hash

{
  "image": {
    "id": {IMAGE_ID},
    "url": {IMAGE_URL},
    "metadata": {IMAGE_METADATA}
  }
}

Duplicate Detection

By default RTFM will check each image to see if its URL has previously been moderated by the system. If so we will send a webhook result immediately, bypassing the crowd. The confidence and score will be set to the results of the duplicate image. In addition the id of the duplicate image will be returned in the ‘duplicate_id’ attribute. This feature can be disabled inside the API settings.

Retrieving

GET /images/{IMAGE_ID} returns data about a posted image:

curl https://rtfm.crowdflower.com/v1/images/{IMAGE_ID} \
  -u 7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0:

GET /images_by_url?url={IMAGE_URL} returns data about the image by its URL:

curl https://rtfm.crowdflower.com/v1/images_by_url \
  -u '7xgUnFVvHSsotiS6zaX2Nw1gcqwnBH0:' \
  -d 'url=http://awesomesite.com/profiles/1234/picture.jpg'

Response:

# attributes:
#
# image: hash
#   "id": string
#   "url": string
#   "score": integer
#   "rating": string
#   "state": string
#   "metadata" (optional): hash
#   "reasons" (optional): array of strings
#   "duplicate_id" (optional): string

{
  "image": {
    "id": {IMAGE_ID},
    "url": {IMAGE_URL},
    "score": {IMAGE_SCORE},
    "rating": {IMAGE_RATING},
    "state": {IMAGE_STATE},
    "metadata": {IMAGE_METADATA}
    "reasons" : {REASONS}
    "duplicate_id" : {DUPLICATE_ID}
  }
}

Both endpoints return only images submitted to the subscription identified by the API key.

Scanning rejected images

/images/rejected/?period={PERIOD}&max_results={MAX}&start_key={KEY} scans all completed and rejected images submitted within PERIOD after KEY. It returns up to MAX results.

The PERIOD is a set of images submitted for moderation in a month. We specify the month in YYYY-MM format. The PERIOD is required.

We use a KEY value to define a unique ordering of returned images. When you start the iteration, you provide no KEY. The iteration will start from the beginning of the period. After that each call will return the value of the last_key. You should pass that value into the next call to /images/rejected to fetch the next page. When there are no move values to be returned, iteration will return last_key equal to none.

The KEY is used to fetch the next page of the results. The image corresponding to the KEY is the last image on a previously returned page and will not be returned. When the KEY is omitted, the iteration starts from the first image of the interval. The KEY must be within the specified period.

curl 'https://rtfm.crowdflower.com/v1/images/rejected' \
  -d 'period=2012-07' \
  -d 'max_results=100' \
  -u 'API_KEY:'

curl 'https://rtfm.crowdflower.com/v1/images/rejected' \
  -d 'period=2012-07' \
  -d 'max_results=100' \
  -d 'start_key=4030424_2012-07-03T20:32:07Z'
  -u 'API_KEY:'

The max_results parameter limits the number of images returned. If specified, it must be between 1 and 100. Default is 100.

Response 2: some page, multiple images returned.


{ 
  "images": [
    {
      "url":"https://rtfm.crowdflower.com/v1/image_bad.jpg",
      "id":"IM779cd030-859a-012f-59fe",
      "state":"completed",
      "rating":"rejected",
      "score":0.1,
      "created_at":"2012-05-21T17:43:58Z"
    },
    {
      "url":"https://rtfm.crowdflower.com/v1/images_really_bad.jpg",
      "id":"IM779cd030-859a-012f-ffff",
      "state":"completed",
      "rating":"rejected",
      "score":0.0,
      "created_at":"2012-05-23T15:00:36Z"
    }
  ],
  "last_key": "263378_2012-05-23T15:00:36Z"
}

Response 2: final page, no more images.


{ 
  "images": [],
  "last_key": "none"
}

Webhook

We will POST moderation results to your webhook URL. This URL is managed in your account settings.

# attributes:
#
# image: hash
#   "id": string
#   "url": string
#   "score": integer
#   "rating": string
#   "state": string
#   "metadata" (optional): hash
#   "reasons" (optional): array of strings

{
  "image": {
    "id": {IMAGE_ID},
    "url": {IMAGE_URL},
    "score": {IMAGE_SCORE},
    "rating": {IMAGE_RATING},
    "state": {IMAGE_STATE},
    "metadata": {IMAGE_METADATA}
    "reasons" (optional): list of strings
  }
}

Failure

RTFM expects a 2xx HTTP response code when posting webhooks to your server. Your webhook will immediately be disabled if we receive a 3xx or 4xx response code. RTFM will retry up to 10 times if your server returns a 5xx error. If there are 10 consecutive 5xx errors for a single image, we will disable the webhook and notify you. All pending webhooks will be sent when you re-enable your webhook.

Signature

All webhook POSTs include a “X-Crowdflower-Signature” HTTP header. You can use this signature to ensure that you only accept requests from our servers.

The signature is composed of a Base64 encoded SHA1 digest of the JSON body and is computed by signing the unescaped JSON body of the request with an RSA key.

Use this public key to verify the signature.

Shell:

echo -n "unescaped JSON body of webhook request" > body.json
echo "base64 encoded contents of X-Crowdflower-Signature" | openssl enc -base64 -d > sig.txt
openssl dgst -sha1 -verify webhook_public.pem -signature sig.txt body.json

Ruby:

public_key = OpenSSL::PKey::RSA.new(File.read("webhook_public.pem"))
digest = OpenSSL::Digest::SHA1.new
public_key.verify(digest, Base64.decode64(signature), unencoded_json_body)

PHP:

$fp = fopen("webhook_public.pem", "r");
$cert = fread($fp, 8192);
fclose($fp);
$pubkeyid = openssl_get_publickey($cert);
openssl_verify($data, $signature, $pubkeyid);
openssl_free_key($pubkeyid);

API Libraries

Ruby

CrowdFlower currently maintains a Ruby gem for interacting with the API. The gem can be found here: https://github.com/dolores/rtfm-ruby

Other

If you’ve created a simple wrapper for the RTFM API in another language, we would be happy to host it on our Github account. Let us know by emailing rtfm@crowdflower.com

Moderation

Moderators follow extensive rules while screening images. These rules have been effective in avoiding user complaints and problems with Apple’s App Store review process. We provide two rulesets: a strict one (cf_strict, default) and a more lenient one (cf_standard). The more lenient ruleset accepts borderline images.

Here’s the full list of reasons to reject an image:

Sexual content

  • No images of sexual acts, either real, illustrated or simulated.
  • No sexually explicit or overly suggestive photos
  • No photos that contain sex props and toys, including the use of fruits/vegetables.
  • No photos of any obscene gestures and/or lewd behavior. (e.g., the middle finger or a banana used to simulate male genitalia)
  • No photos with transparent/sheer or wet material below the waist, or covering women’s nipples/breasts.
  • No images that display semen (or any fluid made to look like semen or ejaculation) on anything in photo.
  • No photos that have been edited to disguise sexual acts, such as a black box or blurred filters to hide touching of genitals with a hand.

Crotch or pelvic shot/content

  • No pubic hair can be visible.
  • No bare skin one inch directly above the pubic area.
  • No erections (or outline of genitals through clothing) will be allowed.
  • No photos with grabbing, holding or touching genitals or genital area.
  • No images of hands or fingers placed in pants or pulling underwear outward

Nudity

  • No nudity (no frontal, back or side nudity)
  • Nudity (particularly the genitals) covered up by a towel, hat or other means is not allowed.
  • No cleavage shots without a face.

Inappropriate dress

  • Appropriate public swimwear is allowed if the following is observed: No pubic hair, no women’s nipples, no outline of genitals and no portion of the butt can be present. Swimwear photos are only allowed if they are in natural settings (e.g., a swimming pool or beach)
  • Pants and shorts must be worn normally, buttoned, and not pulled or hanging down.

Violent or illegal material

  • No images of illegal drugs, drug use and/or drug paraphernalia.
  • No images of guns, firearms or weapons.
  • No photos of violent acts to the user, to someone else, or to animals; including blood in photo.
  • No profanity or curse words can appear in the image

Ads or copyrighted material

  • No image can be used to advertise services, goods, events, websites, or apps.
  • No copyrighted pictures or illustrations are allowed

Borderline

  • No body/torso shots without a head/face, without clothes or in underwear
  • No crotch/butt only photos, nor abs-only photos
  • No shirtless body shots indoors, even if it contains a face – shirtless body shots are only allowed in natural settings (e.g., a swimming pool or beach)
  • No underwear can be visible (including the underwear waistband showing above pants)

Version history

  • 1.2.0 - Allow multiple rulesets and reason collection
  • 1.0.0 - Initial API: image moderation, webhooks

Contact

Report issues or suggest improvements at rtfm@crowdflower.com.