In8ness Music Classifier - Audio Analysis Guide
Overview
This music classifier analyzes audio files across five major analytical frameworks to understand the psychological, acoustic, and musical characteristics of songs. The system extracts low-level audio features and uses them to predict higher-level constructs like music preference dimensions and their correlations with personality traits.
Analysis Frameworks:
- MUSIC 5 psychological dimensions of music preference
- Big Five Personality trait correlations with music appreciation
- Genres 50+ genre classifications
- Attributes 14 perceptual descriptors
- Features 20+ acoustic measurements
How It Works
Audio File
Feature Extraction
Analysis
Results
1. MUSIC Dimensions
What Are MUSIC Dimensions?
The MUSIC model (Mellow, Unpretentious, Sophisticated, Intense, Contemporary) is a psychological framework developed by Rentfrow et al. (2011) that categorizes music preferences into five dimensions. These dimensions correlate with personality traits and reflect how people perceive and respond to music.
1.1 Mellow
Definition: Smooth, relaxing, romantic, and slow music. Characterized by soft acoustic sounds, minimal percussion, and emotional depth.
Musical Characteristics:
- Low energy and tempo
- High dynamic range (soft to moderate dynamics)
- Low percussiveness
- Emphasis on vocals or acoustic instruments
- Minimal dissonance
Example Genres: Soft rock, easy listening, adult contemporary, singer-songwriter
Measured By:
• Energy level (inverse relationship)
• Tempo (slower = more mellow)
• Dynamic range (wider range = more emotional)
• Percussiveness (less = more mellow)
• Dissonance (lower = more mellow)
| Score Range |
Interpretation |
Description |
| 0.0 - 0.3 |
Not Mellow |
Energetic, harsh, intense music |
| 0.3 - 0.6 |
Moderately Mellow |
Balanced energy with some soft elements |
| 0.6 - 1.0 |
Highly Mellow |
Very smooth, relaxing, and calming |
1.2 Unpretentious
Definition: Simple, straightforward music that is easy to understand and accessible. Often country, folk, or traditional music without complex arrangements.
Musical Characteristics:
- Low spectral complexity
- Simple harmonic structures
- Traditional instrumentation
- Straightforward melodies
- Minimal production effects
Example Genres: Country, folk, bluegrass, traditional music
| Score Range |
Interpretation |
Description |
| 0.0 - 0.3 |
Complex |
Sophisticated, intricate arrangements |
| 0.3 - 0.6 |
Moderately Simple |
Accessible with some complexity |
| 0.6 - 1.0 |
Very Unpretentious |
Simple, accessible, traditional |
1.3 Sophisticated
Definition: Complex, intellectually engaging music requiring active listening. Often classical, jazz, or progressive styles with intricate arrangements.
Musical Characteristics:
- High spectral complexity
- Wide dynamic range
- Complex harmonies and structures
- Instrumental focus
- Artistic production
Example Genres: Classical, jazz, progressive rock, art rock, avant-garde
| Score Range |
Interpretation |
Description |
| 0.0 - 0.3 |
Simple |
Straightforward, easy to follow |
| 0.3 - 0.6 |
Moderately Complex |
Some intricate elements |
| 0.6 - 1.0 |
Highly Sophisticated |
Complex, artistic, intellectually engaging |
1.4 Intense
Definition: Loud, aggressive, energetic music with distortion and powerful rhythms. Often rock, metal, or electronic dance music.
Musical Characteristics:
- High energy and loudness
- Heavy dissonance
- Strong percussiveness
- Aggressive rhythms
- Powerful dynamics
Example Genres: Heavy metal, hard rock, punk, aggressive electronic
| Score Range |
Interpretation |
Description |
| 0.0 - 0.3 |
Calm |
Gentle, peaceful music |
| 0.3 - 0.6 |
Moderately Energetic |
Some energy and drive |
| 0.6 - 1.0 |
Highly Intense |
Aggressive, powerful, loud |
1.5 Contemporary
Definition: Modern, upbeat music using electronic production and contemporary styles. Dance, pop, rap, and electronic music.
Musical Characteristics:
- Electronic/synthesized sounds
- Strong rhythmic patterns
- Modern production techniques
- High brightness (electronic sounds)
- Repetitive structures
Example Genres: Electronic dance, hip-hop, modern pop, rap
| Score Range |
Interpretation |
Description |
| 0.0 - 0.3 |
Traditional |
Acoustic, conventional sounds |
| 0.3 - 0.6 |
Mixed |
Blend of modern and traditional |
| 0.6 - 1.0 |
Highly Contemporary |
Modern, electronic, cutting-edge |
How MUSIC Scores Are Calculated:
- Feature Extraction: 15+ audio features are extracted from the signal
- Weighted Classification: Features are weighted based on their importance to each dimension
- Genre Correlation: Genre predictions provide context (10% influence)
- Attribute Correlation: Psychological attributes refine scores (12% influence)
- Normalization: Scores are scaled to 0.0-1.0 range, capped at 90%
Example Calculation for "Intense":
Base Score = (energy × 0.3) + (loudness × 0.25) + (dissonance × 0.25) + (percussiveness × 0.2)
+ Genre adjustment (if metal: +10%)
+ Attribute adjustment (if "aggressive": +5%)
= Final Intense Score (0.0-0.9)
2. Personality Traits (Big Five Correlations)
What Are Personality Trait Correlations?
Research shows that music preferences correlate with the Big Five personality traits. This section displays correlations between the music's MUSIC dimension scores and levels of music appreciation based on each personality trait. These correlations reflect research findings about how different personality types tend to engage with different types of music—they do not predict or assess the listener's actual personality.
2.1 Openness to Experience
Definition: Curiosity, creativity, appreciation for art and novel experiences.
Correlation with MUSIC Dimensions:
- High Openness: Correlates with preference for Sophisticated music (r = +0.40)
- High Openness: Correlates with preference for Intense music (r = +0.20)
- Low Openness: Correlates with avoidance of Unpretentious music (r = -0.10)
Correlation Score Calculation:
Score = 50 + (Sophisticated × 40) + (Intense × 20) - (Unpretentious × 10)
Range: 0-100
| Score Range |
Interpretation |
| 0 - 30 |
Music typically preferred by those with low openness (familiar, conventional) |
| 30 - 70 |
Music with moderate appeal across openness levels |
| 70 - 100 |
Music typically preferred by those with high openness (complex, novel) |
2.2 Conscientiousness
Definition: Organization, responsibility, goal-oriented behavior.
Correlation with MUSIC Dimensions:
- High Conscientiousness: Correlates with preference for Unpretentious music (r = +0.25)
- High Conscientiousness: Correlates with preference for Sophisticated music (r = +0.15)
- Low Conscientiousness: Correlates with avoidance of Intense music (r = -0.20)
Correlation Score Calculation:
Score = 50 + (Unpretentious × 25) + (Sophisticated × 15) - (Intense × 20)
Range: 0-100
Definition: Sociability, assertiveness, energetic engagement with external world.
Correlation with MUSIC Dimensions:
- High Extraversion: Correlates with preference for Contemporary music (r = +0.35)
- High Extraversion: Correlates with preference for Intense music (r = +0.25)
- Low Extraversion: Correlates with preference for Mellow music (r = -0.15)
Correlation Score Calculation:
Score = 50 + (Contemporary × 35) + (Intense × 25) - (Mellow × 15)
Range: 0-100
2.4 Agreeableness
Definition: Compassion, cooperation, trust in others.
Correlation with MUSIC Dimensions:
- High Agreeableness: Correlates with preference for Mellow music (r = +0.30)
- High Agreeableness: Correlates with preference for Unpretentious music (r = +0.20)
- Low Agreeableness: Correlates with preference for Intense music (r = -0.30)
Correlation Score Calculation:
Score = 50 + (Mellow × 30) + (Unpretentious × 20) - (Intense × 30)
Range: 0-100
2.5 Neuroticism (Emotional Stability)
Definition: Tendency toward negative emotions, anxiety, emotional instability.
Correlation with MUSIC Dimensions:
- High Neuroticism: Correlates with preference for Intense music (r = +0.20)
- High Neuroticism: Correlates with preference for Mellow music (r = +0.15)
- Low Neuroticism: Correlates with preference for Contemporary music (r = -0.10)
Correlation Score Calculation:
Score = 50 + (Intense × 20) + (Mellow × 15) - (Contemporary × 10)
Range: 0-100
3. Genre Classification
What Is Genre Classification?
Genre classification categorizes music into stylistic categories based on musical characteristics. This system uses audio features (not metadata) to predict genre using rule-based logic.
3.1 Electronic / Dance (10 genres)
House: 120-130 BPM, high rhythm
Techno: 125-145 BPM, mechanical
Trance: 130-145 BPM, high energy
Drum & Bass: 160+ BPM
Ambient: Low energy, complex
EDM: Very high energy, loud
Downtempo: <115 BPM, electronic
Dubstep: Heavy bass, rhythmic
Chillout: Relaxed, electronic
Breakbeat: Syncopated, 120-140 BPM
Key Features: Tempo, rhythm strength, percussiveness, electronic brightness
3.3 Hip Hop / Rap (3 genres)
Trap: 65-85 BPM, very percussive
Boom Bap: 85-100 BPM
Hip Hop: 70-110 BPM, rhythmic
Key Features: Tempo (slower), percussiveness, rhythm strength
3.4 Jazz (4 genres)
Bebop: Fast (>180 BPM), complex
Smooth Jazz: Slow, low energy
Jazz Fusion: Mixed instrumentation
Jazz: High complexity, instrumental
Key Features: Spectral complexity, tempo variation, instrumental
3.5 Classical (3 genres)
Symphony: Very instrumental, wide dynamics
Chamber Music: Intimate, moderate complexity
Opera: Vocal focus, dramatic
Key Features: Instrumental content, complexity, dynamic range (>15 dB)
3.6 Blues (3 genres)
Electric Blues: Guitar-centric
Delta Blues: Acoustic, traditional
Blues: 60-140 BPM, moderate dissonance
Key Features: Tempo, distinctive dissonance (0.3-0.6), vocal
3.7 Pop (3 genres)
Dance Pop: High rhythm, high energy
Indie Pop: Moderate energy
Pop: 100-140 BPM, vocal, accessible
Key Features: Tempo, vocal content, structural simplicity
3.8 R&B / Soul / Funk (4 genres)
R&B: 60-110 BPM, very vocal
Soul: Emotional dynamics
Funk: Syncopated rhythms
Neo-Soul: Modern production
Key Features: Tempo range, vocal content, rhythm patterns
3.9 Country / Folk (3 genres)
Country: Bright (acoustic), vocal
Bluegrass: Fast, very bright
Folk: Simple, traditional
Key Features: Brightness (acoustic), vocal, simplicity
3.10 Reggae / Dancehall (3 genres)
Reggae: 60-100 BPM, distinctive rhythm
Dancehall: More percussive
Dub: Bass-heavy, echo
Key Features: Specific tempo, rhythm patterns
Top 5 Genres Output
The system always returns the top 5 most probable genres with confidence scores:
Example Output:
1. Electronic---House (85%)
2. Pop---Dance Pop (67%)
3. Electronic---EDM (61%)
4. Rock---Alternative Rock (48%)
5. Pop (42%)
Confidence Score Interpretation:
- 70-100%: Very confident match
- 50-69%: Good match
- 30-49%: Possible match
- Below 30%: Unlikely (filtered out)
4. Audio Attributes Analysis
What Are Audio Attributes?
Audio attributes are 14 psychological/perceptual descriptors derived from audio features. They describe how the music "feels" rather than technical measurements.
The 14 Attributes
| Attribute |
Definition |
Calculation |
Range |
| Dense |
How "thick" or layered the sound is |
complexity × 0.6 + (1 - dynamic_range/20) × 0.4 |
0.0 = sparse, 1.0 = very dense |
| Distorted |
Amount of harmonic distortion |
dissonance × 0.7 + zero_crossing_rate × 0.3 |
0.0 = clean, 1.0 = heavily distorted |
| Electric |
Electronic vs. acoustic sound |
brightness × 0.5 + (1 - instrumental) × 0.3 + rhythm × 0.2 |
0.0 = acoustic, 1.0 = electronic |
| Fast |
Perceived speed/urgency |
tempo/200 × 0.6 + energy × 0.4 |
0.0 = very slow, 1.0 = very fast |
| Instrumental |
Lack of vocals |
instrumental_probability |
0.0 = all vocals, 1.0 = no vocals |
| Loud |
Subjective loudness |
min(1, max(0, (loudness + 30) / 30)) |
0.0 = quiet, 1.0 = very loud |
| Percussive |
Prominence of drums/rhythm |
rhythm_strength × 0.6 + percussiveness × 0.4 |
0.0 = no drums, 1.0 = drum-dominated |
| Aggressive |
Hostile or confrontational feeling |
energy × 0.35 + dissonance × 0.35 + loudness × 0.3 |
0.0 = gentle, 1.0 = very aggressive |
| Complex |
Structural sophistication |
spectral_complexity × 0.6 + dynamic_range/20 × 0.4 |
0.0 = simple, 1.0 = very complex |
| Inspiring |
Uplifting, motivational quality |
energy × 0.4 + (1 - dissonance) × 0.3 + brightness × 0.3 |
0.0 = depressing, 1.0 = very inspiring |
| Intelligent |
Cerebral, thought-provoking |
complexity × 0.5 + instrumental × 0.3 + dynamic_range/20 × 0.2 |
0.0 = simple, 1.0 = intellectually engaging |
| Relaxing |
Calming, stress-reducing |
(1 - energy) × 0.4 + (1 - dissonance) × 0.3 + (1 - percussive) × 0.3 |
0.0 = tense, 1.0 = very relaxing |
| Romantic |
Intimate, loving quality |
(1 - energy) × 0.3 + vocal × 0.4 + (1 - dissonance) × 0.3 |
0.0 = unromantic, 1.0 = very romantic |
| Sad |
Melancholic emotional tone |
(1 - energy) × 0.3 + (minor_key) × 0.4 + (1 - brightness) × 0.3 |
0.0 = happy, 1.0 = very sad |
How Attributes Influence MUSIC Dimensions
Attributes provide 12% adjustment to MUSIC scores. For example:
- If song is "aggressive" (0.8) → Intense +9.6%
- If song is "relaxing" (0.7) → Mellow +8.4%
- If song is "complex" (0.9) → Sophisticated +10.8%
5. Extracted Audio Features (Low-Level)
What Are Audio Features?
These are objective, measurable properties of the audio signal. They form the foundation for all higher-level analyses.
5.1 Temporal Features
| Feature |
Definition |
Unit/Range |
Example |
| Duration |
Total length of audio |
Seconds |
180.5 s = 3 min 0.5 s |
| Tempo (BPM) |
Beats per minute, perceived speed |
60-180 BPM typically |
House ≈ 128 BPM |
Tempo Calculation:
BPM = (beats / duration) × 60
Accuracy: ±5 BPM
Method: Beat detection via energy peak analysis
5.2 Amplitude Features
| Feature |
Definition |
Range |
Typical Values |
| Loudness (dB) |
RMS amplitude in decibels |
-60 dB to 0 dB |
Classical: -20 to -12 dB Pop: -12 to -6 dB Electronic: -8 to -3 dB |
| Energy |
Total acoustic energy |
0.0 to 1.0 |
Ambient: 0.2-0.4 Pop: 0.5-0.7 Metal: 0.7-0.9 |
| Dynamic Range |
Loudest vs. softest parts |
0 to 30+ dB |
Modern pop: 5-8 dB Classical: 15-25 dB Audiophile: 20-30 dB |
Formulas:
Loudness: 20 × log10(RMS_amplitude)
Energy: mean(signal²) / max_possible_energy
Dynamic Range: 95th_percentile - 5th_percentile (dB)
5.3 Spectral Features
| Feature |
Definition |
Range/Unit |
Interpretation |
| Spectral Complexity |
How many frequencies present |
0.0 to 1.0 |
Simple pop: 0.3-0.5 Jazz/Classical: 0.6-0.8 |
| Spectral Centroid |
"Center of mass" of spectrum |
200 to 8000 Hz |
Bass-heavy: 500-1500 Hz Bright: 3000-6000 Hz |
| Brightness |
High-frequency content |
0.0 to 1.0 |
Dubstep: 0.2-0.4 Acoustic: 0.6-0.8 |
| Dissonance |
Harshness, lack of harmony |
0.0 to 1.0 |
Classical: 0.2-0.4 Metal: 0.6-0.9 |
| Zero-Crossing Rate |
Signal crosses zero amplitude |
0 to 10000+ |
Bass: 1000-3000 Distorted: 6000-10000+ |
Key Formulas:
Spectral Centroid: Σ(frequency × magnitude) / Σ(magnitude)
Brightness: Energy(>2000Hz) / Total_Energy
Dissonance: High_freq_energy × 0.6 + Spectral_irregularity × 0.4
5.4 Rhythm Features
| Feature |
Definition |
Range |
Typical Values |
| Danceability |
Suitability for dancing |
0.0 to 1.0 |
Ambient: 0.1-0.3 Pop: 0.5-0.7 EDM: 0.7-0.9 |
| Rhythm Strength |
Clarity of rhythmic patterns |
0.0 to 1.0 |
Ambient: 0.1-0.3 Electronic: 0.7-0.9 |
| Percussiveness |
Prominence of percussion |
0.0 to 1.0 |
String quartet: 0.0-0.2 Hip-hop: 0.7-0.9 |
5.5 Tonal & Harmonic Features
| Feature |
Definition |
Values/Range |
Method |
| Key |
Tonal center |
C, C#, D, D#, E, F, F#, G, G#, A, A#, B |
Krumhansl-Schmuckler algorithm |
| Key Scale |
Major or minor mode |
"major" or "minor" |
Pitch distribution comparison |
| Key Strength |
Confidence in key detection |
0.0 to 1.0 |
0.0-0.3: Ambiguous 0.6-0.8: Clear 0.8-1.0: Very strong |
| Tuning Frequency |
Reference pitch (A4) |
435-445 Hz (standard: 440 Hz) |
Detects non-standard tunings |
5.6 Voice Detection Features
| Feature |
Definition |
Range |
Examples |
| Voice Probability |
Likelihood vocals are present |
0.0 to 1.0 |
A cappella: 0.9 Pop vocal: 0.6-0.7 Instrumental: 0.0-0.2 |
| Instrumental Probability |
Likelihood of no vocals |
0.0 to 1.0 |
1.0 - voice_probability |
Detection Method:
Analyzes 1-4 kHz energy (voice formant region)
Spectral shape analysis for formant detection
5.7 Emotional Features
| Feature |
Definition |
Formula |
Interpretation |
| Arousal |
Energy/activation level |
(tempo/200 × 0.4) + (energy × 0.4) + (loudness × 0.2) |
0.0-0.3: Calm 0.3-0.7: Moderate 0.7-1.0: Highly arousing |
| Valence |
Positive vs. negative emotion |
(major_key × 0.4) + (brightness × 0.3) + ((1-dissonance) × 0.3) |
0.0-0.3: Sad, negative 0.3-0.7: Neutral 0.7-1.0: Happy, positive |
Emotion Quadrants:
- High Arousal, Positive Valence = Excited, Happy (dance pop)
- High Arousal, Negative Valence = Angry, Tense (metal)
- Low Arousal, Positive Valence = Calm, Peaceful (easy listening)
- Low Arousal, Negative Valence = Sad, Depressed (slow ballads)
6. Spectrogram Visualization
What Is a Spectrogram?
A spectrogram is a visual representation of how frequency content changes over time.
Spectrogram Axes:
- X-axis: Time (left to right)
- Y-axis: Frequency (bottom to top, 0-8000 Hz)
- Color: Intensity/energy at that frequency and time
Generation Process:
1. Signal Division: Audio divided into overlapping frames (2048 samples)
2. Power Spectrum: Energy calculated for each frequency band
3. Normalization: Scaled to 0-1 range
4. Color Mapping: Viridis color scheme applied
Color Interpretation:
- Dark blue: Low energy
- Teal: Moderate energy
- Yellow: High energy
- White: Very high energy
What You Can See:
- Horizontal lines: Sustained tones (vocals, sustained instruments)
- Vertical lines: Transients (drums, attacks)
- Dense lower region: Bass frequencies
- Sparse upper region: High frequencies (cymbals, brightness)
- Patterns: Rhythmic structure, verse/chorus changes
7. Data Flow & Relationships
How Everything Connects
Audio File
Low-Level Feature Extraction
Temporal, Amplitude, Spectral, Rhythm, Tonal Features
Audio Attributes
14 descriptors
Genre Class.
Top 5 genres
Key Detection
Key & scale
MUSIC Dimensions
5 psychological dimensions
Personality Correlations
Big Five appreciation patterns
Influence Weights in MUSIC Dimension Calculation:
- Audio features: 60% (base weight)
- Audio attributes: 12% (refinement)
- Genre correlation: 10% (context)
- Normalization: 18% (spreading/capping)
- Total: 100% final score
| Component |
Input |
Output |
Derivation |
| Genre Classification |
Audio features |
Top 5 genres |
100% feature-based |
| MUSIC Dimensions |
Features + Attributes + Genres |
5 dimension scores |
Weighted combination |
| Personality Correlations |
MUSIC dimensions |
Big Five correlation scores |
100% from MUSIC correlations |
8. Accuracy & Limitations
Expected Accuracy
| Measure |
Accuracy |
Notes |
| Tempo |
85-90% |
±5 BPM typically |
| Key |
70-75% |
Best on tonal music |
| Energy/Loudness |
95%+ |
Objective measurements |
| Spectral Features |
90%+ |
Well-defined calculations |
| Genre (feature-based) |
55-65% |
Limited without ML |
| MUSIC Dimensions |
70-80% |
Based on validated model |
| Personality Correlations |
Research-based |
Shows music appreciation patterns by trait |
| Audio Attributes |
75-85% |
Subjective but consistent |
Limitations
1. Genre Classification
- Feature-based only (no machine learning model)
- Works best on clear, distinct genres
- May struggle with fusion styles or hybrid genres
2. Key Detection
- Less accurate on atonal music
- Can be confused by key modulations
- Requires clear harmonic content
3. Personality Correlations
- Based on population-level research correlations
- Shows typical music appreciation patterns for each trait
- Individual preferences vary significantly
- Does not assess or predict listener's actual personality
- Should not be used for psychological assessment or diagnosis
4. Short Clips
- All measures work better on full songs
- Minimum ~30 seconds recommended
- Tempo detection needs multiple bars
5. Audio Quality
- Low bitrate MP3s may affect accuracy
- Extreme compression affects dynamic range measurements
- Background noise reduces feature clarity
9. References & Research Basis
MUSIC Model:
Rentfrow, P. J., Goldberg, L. R., & Levitin, D. J. (2011). The structure of musical preferences: A five-factor model. Journal of Personality and Social Psychology, 100(6), 1139-1157.
Personality Correlations:
Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi's of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology, 84(6), 1236-1256.
Audio Feature Extraction:
Standard digital signal processing techniques, Web Audio API methods, and Essentia.js algorithms (when available).
Key Detection:
Krumhansl, C. L., & Schmuckler, M. A. (1986). The Petroushka chord: A perceptual investigation. Music Perception, 4(2), 153-184.
Genre Classification:
Feature-based heuristics derived from musicological analysis and tempo, timbre, and rhythm characteristics per genre.
Summary
This music classifier provides a multi-layered analysis:
- Foundation: 20+ low-level audio features
- Perception: 14 psychological audio attributes
- Context: Genre classification (top 5)
- Psychology: 5 MUSIC dimensions
- Correlations: Big Five music appreciation patterns
- Visualization: Spectrogram (frequency over time)
The system is designed for research, music recommendation, playlist organization, and understanding the psychological dimensions of musical preference. All measures are based on audio content analysis without requiring metadata. The personality correlations reflect research findings about how different personality types tend to engage with music—they do not assess individual listeners.