arrays - (C) Recursive strcpy() that takes only 1 parameter

Saturday, February 23, 2019

arrays - (C) Recursive strcpy() that takes only 1 parameter

Let me be clear from the get go, this is not a dupe, I'll explain how.
So, I tasked myself to write a function that imitates strcpy but with 2 conditions:

it needs to be recursive

it must take a single parameter (which is the original string)

The function should return a pointer to the newly copied string. So this is what I've tried so far:

#include 
#include 

#include 

char * my_strcpy(char *original);

int main(void) {
  char *string = my_strcpy("alpine");
  printf("string = <%s>\n", string);
  return 0;
}


char * my_strcpy(char *original){
  char *string = (char *)malloc(10);
  if(*original == '\0') {
    return string;
  }
  *string++ = *original;
  my_strcpy(original + 1);
}

The problem is somewhat obvious, string gets malloc-ed every time my_strcpy() is called. One of the solutions I could think of would be to allocate memory for string only the first time the function is called. Since I'm allowed to have only 1 parameter, only thing I could think of was to check the call stack, but I don't know whether that's allowed and it does feel like cheating.
Is there a logical solution to this problem?

Answer

Recursion is a technique for analysing problems. That is, you start with the problem and think about what the recursive structure of a solution might be. You don't start with a recursive structure and then attempt to shoe-horn your problem willy-nilly into it.

In other words, it's good to practice recursive analysis, but the task you have set yourself -- to force the solution to have the form of a one-parameter function -- is not the way to do that. If you start contemplating global or static variables, or extracting runtime context by breaking into the call stack, you have a pretty good hint that you have not yet found the appropriate recursive analysis.

That's not to say that there is not an elegant recursive solution to your problem. There is one, but before we get to it, we might want to abstract away a detail of the problem in order to provide some motivation.

Clearly, if we have a contiguous data structure already in memory, making a contiguous copy is not challenging. If we don't know how big it is, we can do two traverses: one to find its size, after which we can allocate the needed memory, and another one to do the copy. Both of those tasks are simple loops, which is one form of recursion.

The essential nature of a recursive solution is to think about how to step from a problem to a (slightly) simpler or smaller problem. Or, more commonly, a small number of smaller or simpler problems.

That's the nature of one of the most classic recursive problems: sorting a sequence of numbers. The basic structure: divide the sequence into two roughly equal parts; sort each part (the recursive step) and put the results back together so that the combination is sorted. That basic outline has at least two interesting (and very different) manifestations:

Divide the sequence arbitrarily into two almost equal parts either by putting alternate elements in alternate parts or by putting the first half in one part and the rest in the other part. (The first one will work nicely if we don't know in advance how big the sequence is.) To put the sorted parts together, we have to interleave ("merge") the. (This is mergesort).

Divide the sequence into two ranges by estimating the middle value and putting all smaller values into one part and all larger values into the other part. To put the sorted parts together, we just concatenate them. (This is quicksort.)

In both cases, we also need to use fact that a single-element sequence is (trivially) sorted so no more processing needs to be done. If we divide a sequence into two parts often enough, ensuring that neither part is empty, we must eventually reach a part containing one element. (If we manage to do the division accurately, that will happen quite soon.)

It's interesting to implement these two strategies using singly-linked lists, so that the lengths really are not easily known. Both can be implemented this way, and the implementations reveal something important about the nature of sorting.

But let's get back to the much simpler problem at hand, copying a sequence into a newly-allocated contiguous array. To make the problem more interesting, we won't assume that the sequence is already stored contiguously, nor that we can traverse it twice.

To start, we need to find the length of the sequence, which we can do by observing that an empty sequence has length zero and any other sequence has one more element than the subsequence starting after the first element (the "tail" of the sequence.)

Length(seq):
    If seq is empty, return 0.

    Else, return 1 + Length(Tail(seq))

Now, suppose we have allocated storage for the copy. Now, we can copy by observing that an empty sequence is fully copied, and any other sequence can be copied by placing the first element into the allocated storage and then cipying the tail of the sequence into the storage starting at the second position: (and this procedure logically takes two arguments)

Copy(destination, seq):
    If seq is not empty:
        Put Head(seq) into the location destination
        Call Copy (destination+1, Tail(seq))

But we can't just put those two procedures together, because that would traverse the sequence twice, which we said we couldn't do. So we need to somehow nest these algorithms.

To do that, we have to start by passing the accumulated length down through the recursion so that we can use it at to allocate the storage when we know how big the object. Then, on the way back, we need to copy the element we counted on the way down:

Copy(seq, length):
    If seq is not empty:
        Set item to its first element (that is, Head(seq))
        Set destination to Copy(Tail(seq), length + 1)
        Store item at location destination - 1

        Return destination - 1
    Otherwise: (seq is empty)
        Set destination to Allocate(length)
        # (see important note below)
        Return destination + length

To correctly start the recursion, we need to pass in 0 as the initial length. It's bad style to force the user to insert "magic numbers", so we would normally wrap the function with a single-argument driver:

Strdup(seq):

    Return Copy (seq, 0)

Important Note: if this were written in C using strings, we would need to NUL-terminate the copy. That means allocating length+1 bytes, rather than length, and then storing 0 at destination+length.

Blog

Saturday, February 23, 2019