lucataco / nsfw_video_detection

FalconAIs NSFW detection model, extended for videos

  • Public
  • 10 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

About

Cog implementation of Falconsai/nsfw_image_detection extended for video models

Model Card

Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

Model Description

The Fine-Tuned Vision Transformer (ViT) is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named “google/vit-base-patch16-224-in21k,” is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks.

Intended Uses & Limitations

NSFW Image Classification: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications.

This model will return either the word: “normal” or “nsfw”