The DIVA Framework is a software framework designed to
provide an architecture and a set of software modules which
will facilitate the development of activity recognition
analytics. The Framework is developed as a fully open
source project on GitHub. The following links will help you
get started with the framework:
Framework Main Documentation PageThe source for the
framework documentation is maintained in the Github
repository using Sphinx.
A built version is maintained on ReadTheDocs at this
link. A good place to get started in the documentation,
after reading the Introduction
is the UseCase
section which will walk you though a number of typical
use cases with the framework.
The DIVA Framework is based on KWIVER, an open source
framework designed for building complex computer vision
systems. The following links will help you learn more about
Issue Tracker Submit any bug reports or feature
requests for the KWIVER here. If there's any question
about whether your issues belongs in the KWIVER or DIVA
framework issues tracker, submit to the DIVA tracker and
we'll sort it out..
An ActEV activity is defined to be “one or more people performing a specified movement or
interacting with an object or group of objects”. Activities are annotated by humans using a set of
annotation guidelines that specify how to perform the annotation and the criteria to determine if
the activity occurred. Each activity is formally defined by five elements:
Activity Name - A mnemonic handle for the activity
Activity Description - Textual description of the activity
Begin time rule definition - The specification of what determines the beginning time of the activity
End time rule definition - The specification of what determines the ending time of the activity
Required object type list - The list of objects systems are expected to identify for the activity. Note: this aspect of an activity not addressed by ActEV-PC.
Description and Example Chip Videos
Description: A person closing the door to a vehicle.
Start: The event begins 1 s before the door starts to move.
End: The event ends after the door stops
moving. People in cars who close the car door from
within is a closing event if you can still see the
person within the car. If the person is not visible
once they are in the car, then the closing should not
be annotated as an event.
Objects associated with the activity : Person; and Door or Vehicle
Description: A vehicle turning left or right is determined from the POV of the driver of the vehicle. The vehicle may not stop for more than 10 s during the turn.
Start: Annotation begins 1 s before vehicle has noticeably changed direction.
End: Annotation ends 1 s after the vehicle is no longer changing direction and linear motion has resumed. Note: This event is determined after a reasonable interpretation of the video.
Objects associated with the activity : Vehicle
Description: An object moving from person to vehicle.
Start: The event begins 2 s before the cargo to be loaded is extended toward the vehicle (i.e., before a person’s posture changes from one of “carrying” to one of “loading”).
End: The event ends after the cargo is placed into the vehicle and the person-cargo contact is lost. In the event of occlusion, it ends when the loss of contact is visible.
Objects associated with the activity: Person; and Vehicle
The names of the 37 Known Activities for ActEV’21 SDL
Sep 21, 2020: The ActEV'21 SDL UF with Known Activities opens
ActEV is a series of evaluations to accelerate the development of robust, multi-camera,
automatic activity detection algorithms for forensic and real-time alerting applications.
ActEV is an extension of the annual TRECVID
Surveillance Event Detection (SED) evaluation
where systems will also detect and track objects involved in the activities.
Each evaluation will challenge systems with new data, system requirements, and/or new activities. Currently we are running the ActEV 2021 Sequestered Data Leaderboard (SDL) evaluation that features Unknown Facility and Surprise Activity Testing and the ActEV TRECVID 2020 evaluation that features additional known activities for a known facility.
An ActEV activity is defined to be
“one or more people performing a specified movement or interacting with an object or group of objects”.
Activity detection technologies process extended video streams,
such as those from a CCTV camera, and
automatically detects all instances of the activity by:
(1) identifying the type of activity,
(2) producing a confidence score indicating the presence of instance,
(3) temporally localizing the instance by indicating the begin and end times,
and (4) optionally, detecting and tracking the objects (people, vehicles, objects) involved in the activity.
The ActEV evaluations are being
conducted to assess the robustness of automatic
activity detection for a multi-camera streaming video
Everyone. Anyone who registers can submit to the evaluation server.
here and then based on the evaluation participants
can either ran their activity detection software on
their compute hardware and submit their system output
to the ActEV Scoring Server or submited their runnable
activity detection software to NIST using the
Evaluation Commandline Interface. See the individual
evaluation pages and evaluation plans for details.
Each ActEV evaluation uses a new
video data set, changes the evaluation tasks, or
adds/changes activities. The data will be provided
in MPEG-4 and AVI formatted files. See the
individual evaluation pages for details.
Evaluation Metrics and Tools
The main scoring metrics will be based on
detection, temporal localization, and spatio-temporal localization using evaluation measures that include
the probability of
missed detection and rate of false alarm. See details in the evaluation plans of each evaluation.
Below you will find four example videos from our data sets. There are two example views each of indoor and outdoor.
ActEV Evaluation Tasks
Activity detection has been researched for many years and remains an unsolved computer vision challenge
that requires many capabilities beyond the current state of the art.
The ActEV series supports several evaluation tasks each escalating the difficulty by requiring more specific information from the system.
Presently, there are three evaluation tasks defined: 1) Activity Detection (AD), 2) Activity
and Object Detection (AOD), and (3) Activity and Object Detection and Tracking
(AODT). Each evaluation task is summarized below. For a full description of the evaluation tasks,
read the Evaluation Plan for each specific evaluation.
Activity Detection (AD)
For the Activity Detection task, given a target activity, a system automatically detects and temporally
localizes all instances of the activity. For a
system-identified activity instance to be evaluated as correct, the type of activity must be correct and
the temporal overlap must fall within a minimal requirement.
Activity and Object Detection (AOD)
For the Activity and Object Detection task, given a target activity, a system detects and temporally
localizes all instances of the activity and spatially detects/localizes the people and/or objects associated
with the target activity. For a system-identified instance to be scored as correct, it must meet the temporal
overlap criteria for the AD task and in addition meet the spatial overlap of the identified objects during
the activity instance.
Activity Object Detection and Tracking (AODT)
For the Activity Object Detection and Tracking task, given a target activity, a system detects and temporally
localizes all instances of the activity, spatio-temporally detects/localizes the people and/or objects
associated with the target activity, and properly assigns IDs the objects play in the activity. For a
system-identified instance to be scored as correct, it must meet the temporal overlap criteria and
spatio-temporal overlap of the objects for the AOD task and correctly assign the IDs to the objects as
described in the activity definition.