RPubs

by RStudio

ricardogg

Ricardo González Gil

Recently Published

Make your plots more intuitive: label by Month, not Day of year

This short exercise demonstrates how to improve the readability of seasonal plots by replacing numeric day-of-year (DOY) values with month-based labels on the x-axis. Using surface temperature data from the Scottish Coastal Observatory at Stonehaven, I show how even a small change in labeling can make time-series patterns easier to interpret — especially for broader audiences. The approach compares the default DOY labeling and an enhanced version with month separators and initials.

9 months ago

Assigning positions in blocks of repeated elements in a vector: a performance comparison in R

This exercise explores four methods for assigning positions within consecutive, repeated elements in a vector, efficiently labeling sequences of a target value while keeping other values unchanged. For example, given a vector like 0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0, the desired output is 0 0 0 1 2 3 4 0 0 1 2 0 0 0 1 2 3 4 0 0 0 0 0, a need that arises in various applications such as time series analysis (identifying trends and patterns in sequential data), genomic sequence processing (assigning positions in repeated nucleotides or amino acid sequences), and text data manipulation (detecting and processing repeated words, phrases, or characters). To address these diverse use cases, I developed generalized function versions for four different methods and tested their efficiency across vectors of varying lengths, scaling up to 1 × 10⁶ elements (Figs. 1 and 2). The key takeaways are: 1. Different approaches yield the same result with varying trade-offs in efficiency, readability, and flexibility. 2. rle is the best choice when speed is critical. 3. Benchmarking is essential for selecting methods in large-scale data processing. 4. Alternative implementations not covered here may further optimize performance.