The Great OOP Graveyard: Why R’s Class System Chaos Actually Makes Sense

10 minute read

Published:

The Great OOP Graveyard: Why R’s Class System Chaos Actually Makes Sense

Introduction

This summer I found myself in unfamiliar territory: building an R project that had absolutely nothing to do with statistical analysis. No regression models, no hypothesis tests, no data visualisations. For the first time in years, I had to venture beyond R’s comfortable functional world into its object-oriented wilderness. And what a wild, confusing, oddly fascinating wilderness it turned out to be.

R’s object-oriented programming landscape looks like the aftermath of some great taxonomic war. You’ve got S3 shambling around like a beloved but decrepit ancestor. S4 standing at attention with its formal methods and rigid protocols. R6 strutting around with its OOP swagger. S7 promising to be the chosen one who will finally bring balance to the Force. And scattered around them, the bones of the fallen: proto (used in the great ggplot2 but to be swapped by S7), R.oo, OOP, referenceClasses (R5?), mutateR, and probably a dozen others I’ve mercifully forgotten.

After some years building this kind of stuff in Python, I truly get the feeling. When you look at well-written Python code, it reads like prose. The syntax is clean, the object model is consistent, everything fits together with an almost zen-like harmony. R’s programming language can be so complex that even R-core developers occasionally get surprised by its behaviour. R treats 1 as a vector of length 1, which means there’s no difference between scalars and vectors except in your head. R has assignment operators that go both ways (<- and ->) because why choose? Lists will return NULL, not an error, if you use a name that hasn’t been defined. However, if you think about assigning NULL to an already existing list, it won’t replace the position with NULL. It will remove that entry. There’s no way to have a list containing NULL after creation. And don’t even get me started on environments: all your code is brought in by “sourcing” it and basically running the code in one single namespace. Compared to Python’s clean import x from y, in R it’s hard to organise routines in logical modules. There’s no hierarchy in organising individual R files when developing a package or project.

Python, meanwhile, was born from completely different philosophical soil. Python is the product of software engineering thinking applied with extraordinary discipline and care. When Guido van Rossum designed Python, he was thinking about how programmers think, how code should read, how systems should be structured. Python is gorgeous. Python makes sense. Python feels like programming should feel—clean, logical, consistent, and elegant. It’s the closest you get to programming in natural language English. But then you actually try to do data wrangling in Python, and something interesting happens:

# Python's methodical approach to data summarisation
quarterly_summary = (sales_data
    .query('transaction_date >= "2023-01-01"')
    .groupby(['sales_rep', 'product_line'])
    .agg({
        'gross_revenue': ['sum', 'mean', 'count'],
        'net_profit': ['sum', 'mean'],
        'customer_satisfaction': 'mean'
    }))

# Navigate the multi-index column hell
quarterly_summary.columns = [f"{col[0]}_{col[1]}" if col[1] != '' else col[0] 
                            for col in quarterly_summary.columns]

# Reshape into something actually usable
final_report = quarterly_summary.pivot_table(
    index='sales_rep',
    columns='product_line',
    values=['gross_revenue_sum', 'net_profit_sum']
).fillna(0).round(2)

The beautiful systematic thinking that makes Python elegant for general programming becomes laborious and verbose when applied to the inherently mathematical world of data manipulation. Moreover, the fact that data science is one of the things amongst many that Python does means that if you jump into another framework, say Polars, then you have to learn another syntax from zero.

What we’re witnessing is the collision between two different ways of thinking about computational problems: engineering thinking and mathematical thinking.

The Functional Sweet Spot: Why R’s Domain-Specific Design is Actually Its Superpower

The worst of R is that it comes from statisticians who needed to crunch numbers (without paying a SAS licence) rather than build the perfect programming language. The early designers weren’t trying to build a “proper” programming language—they were trying to build a mathematical calculator that could handle real data. Thus, these aren’t design flaws—they’re design decisions made by people who prioritised mathematical intuition over computational orthodoxy. Thus, R has data as a first-class citizen. They naturally brought in the functional approach they were familiar with, because this is how mathematical education actually works.

I don’t know about you, but in my high-school math class didn’t learn about objects and methods and inheritance. I learnt about functions: \(f(x) = 2x + 3\), that functions are mappings (given an input, produce an output) and function composition: if you have \(f(x) = 2x + 3\) and \(g(x) = x^2\), then \((g \circ f)(x) = g(f(x)) = g(2x + 3) = (2x + 3)^2\). This is the cognitive foundation that every quantitatively-trained person carries. Mathematics is fundamentally about transformations, mappings, and compositions of functions. Not objects, not methods, not inheritance hierarchies—functions.

The same operation we struggled with in Python becomes natural when expressed in R’s mathematical idiom:

# R expressing the natural flow of mathematical thinking
quarterly_summary <- sales_data %>%
  filter(transaction_date >= as.Date("2023-01-01")) %>%
  group_by(sales_rep, product_line) %>%
  summarise(
    total_gross_revenue = sum(gross_revenue),
    avg_gross_revenue = mean(gross_revenue),
    total_net_profit = sum(net_profit),
    avg_customer_satisfaction = mean(customer_satisfaction),
    transaction_count = n(),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = product_line,
    values_from = c(total_gross_revenue, total_net_profit),
    names_sep = "_"
  )

The functional programming paradigm maps perfectly onto data analysis because data analysis is functional mathematical thinking applied to empirical problems. You transform data through sequences of well-defined mathematical operations. You compose simple functions to build sophisticated analytical procedures. You create computational pipelines because that’s how mathematical reasoning naturally flows from raw data to statistical conclusions.

R’s “weird” design choices aren’t weird—they’re the natural computational expression of mathematical thinking patterns that have been refined through centuries of pedagogical evolution.

The Functional Breaking Point: When Mathematical Thinking Hits Its Limits

But here’s what my summer project taught me: sometimes you need to build systems that exist beyond the elegant world of mathematical transformations. Sometimes you need persistent state that survives across multiple operations. Sometimes you need complex interactions between subsystems that each maintain their own internal logic and coordinate through well-defined interfaces.

Base R excels at transforming data, but it struggles with managing persistent connections, maintaining session state across time, coordinating real-time interactions, or building systems that need to remember complex relationships between their components. These aren’t data analysis problems—they’re systems architecture problems. Moreover, with modern data requirements with interactive dashboards, connections to third-party data sources. Think about what today’s R developers actually build: not just one-off statistical analyses, but living systems that researchers, and non-programmer savvy users (clinicians, wet-lab scientist,…) bio interact with daily. Hence, R needs to grow beyond its statistical and data analysis roots.

Now here comes the crucial insight: R developers currently do not agree on what “properties” should an “object” have—and this disagreement reflects two fundamentally different cognitive approaches to the same technical challenges.

The R6 vs S7 battle

R6: The Engineering Solution

R6 embodies classical object-oriented thinking. An R6 object is a self-contained entity with encapsulated state and behaviour. When you create an R6 object, you’re thinking like a software engineer: “What data does this object need to maintain? What operations should it support? How should it manage its internal consistency?” You tell the object what to do ($connect(), $process_data()), and it manages its own state internally. This is encapsulation in the classical sense—perfect for building complex internal components that need sophisticated state management.

The beauty of R6 lies in its unapologetic embrace of mutable state and familiar object-oriented patterns that will feel natural to programmers coming from other languages. When you need an object that remembers things between method calls, that maintains internal counters, that manages connection pools or caches expensive computations, R6 doesn’t fight you—it gives you the tools to build exactly what you need. This classical OOP approach is particularly valuable for attracting non-R programmers who bring solid architectural knowledge from computer science backgrounds, helping to elevate the engineering practices in R projects. Most R programmers transition from data science to programming rather than the other way around, often lacking the architectural know-how that comes from formal CS training. R6 bridges this gap by providing familiar patterns that encourage good software engineering practices whilst still being accessible within R’s ecosystem.

S7: The R Compromise

S7 represents a different philosophical approach: maintaining functional separation whilst adding object-oriented capabilities. In S7, objects are still primarily data containers, but now they can have formal type definitions and method dispatch. The data (the object) remains separate from the operations (the generic functions), preserving the functional mindset that R users expect. This creates wonderfully consistent user-facing APIs that feel mathematical rather than computational.

The base idea in S7 is how it preserves R’s functional soul while adding the structure that complex systems need, making it an ideal stepping stone for traditional R programmers who want to move into more sophisticated project architectures without abandoning their functional thinking patterns. S7 objects are immutable by default—when you “modify” an S7 object, you get a new object back, just like traditional R operations. This immutability means S7 plays nicely with R’s copy-on-modify semantics and feels natural to R users who are accustomed to functional thinking. For statisticians and data scientists who’ve grown comfortable with generic functions like summary() and plot(), S7 extends this familiar paradigm into more complex domains, allowing them to write classes and build sophisticated systems without the cognitive leap required by classical OOP.

Conclusion

In my opinion, the choice between R6 and S7 isn’t about technical superiority, but rather about cognitive alignment with different problem domains. When building this summer project, I tried the S7 approach, but wasn’t very comfortable with the organisation. Switching to R6 helped me to organise the components and processes. I’m thus not sure if S7 will truly be “the chosen one”, but I think it makes sense to use R6 for the internals when building internal systems that require sophisticated state management and encapsulation. Something that the user will never touch. Then, use S7 when building user-facing APIs that need to feel like mathematical operations. I think this way helps maintain mathematical elegance for data analysis work whilst providing engineering power for systems development. Let’s see what the future of R has for us. I really look forward to the end of the OOP wars.