lucataco / nsfw_video_detection

FalconAIs NSFW detection model, extended for videos

  • Public
  • 3.9K runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.020 to run on Replicate, or 50 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 88 seconds. The predict time for this model varies significantly based on the inputs.

Readme

About

Cog implementation of Falconsai/nsfw_image_detection extended for video models

Model Card

Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

Model Description

The Fine-Tuned Vision Transformer (ViT) is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named “google/vit-base-patch16-224-in21k,” is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks.

Intended Uses & Limitations

NSFW Image Classification: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications.

This model will return either the word: “normal” or “nsfw”