...making Linux just a little more fun!
Ben Okopnik [ben at linuxgazette.net]
I don't know why this kind of thing keeps coming up. I never wanted to be a mathematician, I'm just a simple programmer!
(One of these days, I'm going to sail over to an uninhabited island and stay there for six months or so, studying math. This is just embarassing; any time a problem like this comes up, I feel so stupid.)
I've been losing a lot of weight lately, and wanted to plot it on a chart. However, I've only been keeping very sparse records of the change, so what I need to do is interpolate it. In other words, given a list like this:
6/26/2010 334 8/12/2010 311.8 8/19/2010 308.4 9/5/2010 300.0 9/9/2010 298.6 9/14/2010 297.2 9/16/2010 293.6
I need to come up with a "slope" function that will return my weight at any point between 6/26 and 9/16. The time end of it is no problem - I just convert the dates into Unix "epoch" values (seconds since 1970-01-01 00:00:00 UTC) - but the mechanism that I've got for figuring out the weight at a given time is hopelessly crude: I split the total time span into X intervals, then find the data points preceding and following it, and calculate the "slope" between them, then laboriously figure out the value for that point. What I'd really, really like to have is a function that takes the above list and returns the weight value for any given point in time between Tmin and Tmax; I'm sure that it's a standard mathematical function, but I don't know how to implement it.
Can any of you smart folks help? I'd appreciate it.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 10:26:08AM -0400, Anderson Silva wrote:
> On Fri, Sep 17, 2010 at 9:14 AM, Ben Okopnik <ben at linuxgazette.net> wrote: > > I need to come up with a "slope" function that will return my weight at > any point between 6/26 and 9/16. > > I am not sure if I can attach files to the list, I've attached a quick chart I > drew with your data points.
Yep, it came through.
> The slope is negative, what you have to do now, is pick 2 points on the graph, > and find out the length of X and Y for those to points. > > Your slope will be: change in Y / change in X
Yes. The problem is in doing that portably. In other words, for any N data points, can I select some M intervals and have the function generate the correct values for the appropriate X and Y?
It's easy enough to do this manually on a spreadsheet; I want to replicate the functionality of what that spreadsheet engine is doing.
> Since we are talking about dates, we may want to convert the dates to units of > days, like, 10 in 10 days like on the image.
As I'd mentioned, I calculated the date values in seconds since the Unix epoch. That's not much of an issue.
> so, you could pick a point like: > > 300 - 334 / 60 - 0 = -34/60 which is approximately a slope of -1/2
Whoops - that won't work. The slope is going to be different at various points between 300 and 334 - there are several data points between them. That's precisely the problem. Otherwise, I could just take the end points, calculate one slope, and be done.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Jason Wigg [jw5801 at gmail.com]
> Yes. The problem is in doing that portably. In other words, for any N > data points, can I select some M intervals and have the function > generate the correct values for the appropriate X and Y? > > It's easy enough to do this manually on a spreadsheet; I want to > replicate the functionality of what that spreadsheet engine is doing.
The spreadsheet engine is doing linear regression! SciPy has a module for it (scipy.stats.linregress) if you're feeling inclined towards Python. It takes two one dimensional arrays (or one two dimensional array) as inputs and returns a slope, intercept and some correlation factors to tell you how good the fit is.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
If you really feel like delving into the mathematics, wiki has a pretty sound explanation of it.
http://en.wikipedia.org/wiki/Linear_regression
afsilva at gmail.com [(afsilva at gmail.com)]
On Fri, Sep 17, 2010 at 10:37 AM, Ben Okopnik <ben at linuxgazette.net> wrote:
> On Fri, Sep 17, 2010 at 10:26:08AM -0400, Anderson Silva wrote: > > On Fri, Sep 17, 2010 at 9:14 AM, Ben Okopnik <ben at linuxgazette.net> > wrote: > > > > I need to come up with a "slope" function that will return my weight > at > > any point between 6/26 and 9/16. > > > > I am not sure if I can attach files to the list, I've attached a quick > chart I > > drew with your data points. > > Yep, it came through. > > > The slope is negative, what you have to do now, is pick 2 points on the > graph, > > and find out the length of X and Y for those to points. > > > > Your slope will be: change in Y / change in X > > Yes. The problem is in doing that portably. In other words, for any N > data points, can I select some M intervals and have the function > generate the correct values for the appropriate X and Y? > > It's easy enough to do this manually on a spreadsheet; I want to > replicate the functionality of what that spreadsheet engine is doing. > > > Since we are talking about dates, we may want to convert the dates to > units of > > days, like, 10 in 10 days like on the image. > > As I'd mentioned, I calculated the date values in seconds since the Unix > epoch. That's not much of an issue. > > > so, you could pick a point like: > > > > 300 - 334 / 60 - 0 = -34/60 which is approximately a slope of -1/2 > > Whoops - that won't work. The slope is going to be different at various > points between 300 and 334 - there are several data points between them. > That's precisely the problem. Otherwise, I could just take the end > points, calculate one slope, and be done. > > > -- > * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET * > > TAG mailing list > TAG at lists.linuxgazette.net > http://lists.linuxgazette.net/listinfo.cgi/tag-linuxgazette.net >
Understood, can you then just apply the slope formula between 2 points: (x1,y1) and (x2,y2)?
m being the slope, m = (y2-y1)/(x2-x1) ??? (with x2 != x1)
Again, just spitting out stuff
An HTML attachment was scrubbed... URL: <http://lists.linuxgazette.net/private.cg[...]nts/20100917/bafe459e/attachment.htm>
Dimitrios Siganos [dimitris at siganos.org]
On 17/09/10 14:14, Ben Okopnik wrote:
> I need to come up with a "slope" function that will return my weight at > any point between 6/26 and 9/16.
A polynomial can be used to do that but I don't know if there are automatic ways to find the polynomial approximation you are looking for.
A polynomial is a function in the form:
f(x) = a + b*x + c*x**2 + d*x**3 + e*x**4 + ...
Dimitrios Siganos
Kapil Hari Paranjape [kapil at imsc.res.in]
Dear Ben,
On Fri, 17 Sep 2010, Ben Okopnik wrote:
> I've been losing a lot of weight lately,
Something I've been trying to do for a while. So unless its for all the wrong reasons, congratulations!
> What I'd really, really like to have is a function that takes the > above list and returns the weight value for any given point in time > between Tmin and Tmax; I'm sure that it's a standard mathematical > function, but I don't know how to implement it.
I'm still not sure exactly what you want. Suppose we have two points (date,weight)=(d,w); (later date, later weight)=(d';w') then to get the value of weight by linear interpolation between these two at the date e which is between d and d' that is given by
(e-d)/(d'-d) w + (d'-e)/(d'-d) w'
Perhaps that is the formula that you want. The way to remember this is to note that it divides the interval between w and w' in the same proportion as e divides the interval between d and d'.
Regards,
Kapil. --
Ben Okopnik [ben at linuxgazette.net]
[ Anderson, do me a favor and don't quote everything that came before; just the parts you're responding to, please. Thanks! ]
On Fri, Sep 17, 2010 at 11:14:27AM -0400, Anderson Silva wrote:
> On Fri, Sep 17, 2010 at 10:37 AM, Ben Okopnik <ben at linuxgazette.net> wrote: > > Whoops - that won't work. The slope is going to be different at various > points between 300 and 334 - there are several data points between them. > That's precisely the problem. Otherwise, I could just take the end > points, calculate one slope, and be done. > > Understood, can you then just apply the slope formula between 2 points: (x1,y1) > and (x2,y2)? > > m being the slope, m = (y2-y1)/(x2-x1) ??? (with x2 != x1)
Yes, that's the underlying formula; in fact, this is exactly what I'm doing now. The problem is in locating where the point falls and which two points to use. It's a very manual process. However, I've been working with it while this is going on, and I think I'm getting close to a solution.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Fri, 17 Sep 2010, Kapil Hari Paranjape wrote:
> I'm still not sure exactly what you want. Suppose we have two points > (date,weight)=(d,w); (later date, later weight)=(d';w') then to get > the value of weight by linear interpolation between these two at the > date e which is between d and d' that is given by > > (e-d)/(d'-d) w + (d'-e)/(d'-d) w'
That should have been
(e-d)/(d'-d) w' + (d'-e)/(d'-d) w
"The further is from England, the closer is to France". ("Will you, wont you" by Lewis Carrol)
Kapil. --
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 04:33:28PM +0100, Dimitrios Siganos wrote:
> On 17/09/10 14:14, Ben Okopnik wrote: > > I need to come up with a "slope" function that will return my weight at > > any point between 6/26 and 9/16. > > A polynomial can be used to do that but I don't know if there are > automatic ways to find the polynomial approximation you are looking for.
So I'd be trading one problem, where I at least have a clue of where to look, for another, where I have no idea of what the terms are. Thanks a lot.
> A polynomial is a function in the form: > > f(x) = a + b*x + c*x**2 + d*x**3 + e*x**4 + ...
Yes, I remember solving those in school. Unfortunately, the instructor was amazingly boring - I mean, like supernatural powers of creating boredom - and every single person in the class always fell asleep. I know, because I was always last - I struggled manfully - but literally everyone had their head on their desk by the time I went down. I'd never seen anything like it before.
Unfortunately, this also means that I can barely remember anything about polynomials.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Jason Wigg [jw5801 at gmail.com]
On 18 September 2010 01:55, Ben Okopnik <ben at linuxgazette.net> wrote:
> On Fri, Sep 17, 2010 at 04:33:28PM +0100, Dimitrios Siganos wrote: >> A polynomial can be used to do that but I don't know if there are >> automatic ways to find the polynomial approximation you are looking for. > > So I'd be trading one problem, where I at least have a clue of where to > look, for another, where I have no idea of what the terms are. ?Thanks a > lot. >
Can I throw more python modules around? Because numpy.polyfit will do this nicely for you. http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
Ben Okopnik [ben at linuxgazette.net]
Hi, Kapil -
On Fri, Sep 17, 2010 at 09:19:54PM +0530, Kapil Hari Paranjape wrote:
> > On Fri, 17 Sep 2010, Kapil Hari Paranjape wrote: > > I'm still not sure exactly what you want.
[...]
> That should have been > > (e-d)/(d'-d) w' + (d'-e)/(d'-d) w
Right, that's the "static" calculation for a given point, which requires knowing the d, w, d', and w'. I've got the mechanics of this part down pretty well. Again, though, what I need is a function that takes, say, a list of time and weight values as data, a single time value as an argument, and returns a weight value for that time. Some f(x) where f() figures out the appropriate slope for a given time point, and then calculates the weight at that time.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 09:14:01PM +0530, Kapil Hari Paranjape wrote:
> Dear Ben, > > On Fri, 17 Sep 2010, Ben Okopnik wrote: > > I've been losing a lot of weight lately, > > Something I've been trying to do for a while. So unless its for all > the wrong reasons, congratulations!
No, it's definitely for the right reasons - and it's a hugely positive change in my life. My goal is to get down to 199 lbs, and I'm almost a third of the way there. That's in less than three months, too. I'm really committed to this.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Sat, Sep 18, 2010 at 01:59:35AM +1000, Jason Wigg wrote:
> > Can I throw more python modules around? Because numpy.polyfit will do > this nicely for you. > http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
Absolutely. I'd prefer to see the actual solution rather than just feeding the data to a magic black box, but I'm definitely a big fan of quick-and-easy solutions, too. Bring on your mad Python skills.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Fri, 17 Sep 2010, Ben Okopnik wrote:
> On Fri, Sep 17, 2010 at 04:33:28PM +0100, Dimitrios Siganos wrote: > > On 17/09/10 14:14, Ben Okopnik wrote: > > A polynomial can be used to do that but I don't know if there are > > automatic ways to find the polynomial approximation you are looking for. > > So I'd be trading one problem, where I at least have a clue of where to > look, for another, where I have no idea of what the terms are. Thanks a > lot. > > Unfortunately, this also means that I can barely remember anything > about polynomials.
The "polyanimal" that you are looking for is called the "legendary polyanimal".
Seriously, it is actually the Legendre interpolation by polynomials.
So here is what you want.
You have a table like
epoch e0 e1 e2 .... en value v0 v1 v2 .... vn
You want a formula f (also called a polynomial formula) so that
f(ek) = vk for all k=0,1,2,...n
and you want f to be the simplest possible (also called least degree).
The formula for f is (I will use pseudo-code which ought to be self-explanatory):
f(e) = sum(i=0,n,ei*Qi(e))
where finding Qi(e) is the "key"!
The idea is as follows, suppose we had the simpler table
epoch e0 e1 e2 ... ei ... en value 0 0 0 ... 1 ... 0
(i.e. all values are 0 except the i-th value). Then the formula we want would give Qi(e). This one is easy to write
prod(j=0,n,if(j!=i,e-ej,1)) Qi(e) = --------------------------- prod(j=0,n,if(j!=i,ei-ej,1))
Hope this was less boring than your other math instructor.[*]
Kapil.
[*] Math is always easier when you know why you are doing it, even when the reasons are aesthetic. --
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Fri, 17 Sep 2010, Kapil Hari Paranjape wrote:
> The formula for f is (I will use pseudo-code which ought to be > self-explanatory): > > f(e) = sum(i=0,n,ei*Qi(e))
Sorry for a minor typo again. That should have been
f(e) = sum(i=0,n,vi*Qi(e))
Otherwise, we would not have used vi in our formula at all and that would be wrong. (Partly since vi is better than that other editor!)
Note that polynomial interpolation may not be what you want.
The formula I gave earlier (and the one you already know) is called "polygonal" interpolation and it does not appear to have a simpler expression than something like
f(e) = {i=0; while(e<e[i];i++); L[i](e) }
Where (e-e[i])*v[i+1] + (e[i+1]-e)*v[i] L[i](e) = --------------------------------- (e[i+1]-e[i])
After Turing/Post/Church, we can all say "Programs are formulae too!".
Regards,
Kapil. --
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 09:46:30PM +0530, Kapil Hari Paranjape wrote:
> > The "polyanimal" that you are looking for is called the "legendary > polyanimal".
[grin] I just knew you'd come up with something interesting.
> Seriously, it is actually the Legendre interpolation by polynomials.
Oh, good. False cognates between math and English - how much better do things get?
> So here is what you want. > > You have a table like > > epoch e0 e1 e2 .... en > value v0 v1 v2 .... vn > > You want a formula f (also called a polynomial formula) so that > > f(ek) = vk for all k=0,1,2,...n
Exactly!
> and you want f to be the simplest possible (also called least > degree). > > The formula for f is (I will use pseudo-code which ought to be > self-explanatory): > > f(e) = sum(i=0,n,ei*Qi(e))
I take it 'i' is the... what do you call it... the "fenceposts" between the intervals (can't think of the right word, darn it)?
> where finding Qi(e) is the "key"!
Yep. That's the problem in a nutshell.
> The idea is as follows, suppose we had the simpler table > > epoch e0 e1 e2 ... ei ... en > value 0 0 0 ... 1 ... 0 > > (i.e. all values are 0 except the i-th value). Then the formula we > want would give Qi(e). This one is easy to write > > prod(j=0,n,if(j!=i,e-ej,1)) > Qi(e) = --------------------------- > prod(j=0,n,if(j!=i,ei-ej,1))
[laugh] Kapil, I'm fascinated by what you consider "easy". OK, so what would it be if the values weren't zeros? Does the formula change? Also, I'm afraid I don't understand the 'prod(j=0,n,if(j!=i,e-ej,1))' notation: is that a product of... something?
> Hope this was less boring than your other math instructor.[*]
Oh, this is definitely fun. And it's not that polynomials themselves weren't interesting - I was actually enjoying learning about them; it literally was impossible to stay awake in this guy's class. I had money riding on it, in fact: passing it meant that my employer would pay for my education. Didn't help.
> [*] Math is always easier when you know why you are doing it, > even when the reasons are aesthetic.
Not just math. That's why I always try to come up with real world examples that are meaningful and relevant (and hopefully interesting) for my students: if you're involved and invested in solving the problem, you're always far more likely to succeed. Even people who aren't considered too sharp can do amazing things when they're personally interested. I've seen this again and again.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 09:59:50PM +0530, Kapil Hari Paranjape wrote:
> > Sorry for a minor typo again. That should have been > > f(e) = sum(i=0,n,vi*Qi(e)) > > Otherwise, we would not have used vi in our formula at all and that > would be wrong. (Partly since vi is better than that other editor!)
[laugh] I know that 'vi' can do many things, but linear interpolation is a new one on me.
Like any talented dog, it can do flips. Like any talented cow, it can do precision bitmap alignment. -- Apple Technote 31, describing the Moof (Dogcow)
> Note that polynomial interpolation may not be what you want.
I actually have no idea of what $thing_name I want. But I can describe the gears, the wiring, and the shape of the plastic widgets it's supposed to produce.
> The formula I gave earlier (and the one you already know) is called > "polygonal" interpolation and it does not appear to have a simpler > expression than something like > > f(e) = {i=0; while(e<e[i];i++); L[i](e) } > > Where > (e-e[i])*v[i+1] + (e[i+1]-e)*v[i] > L[i](e) = --------------------------------- > (e[i+1]-e[i]) > > After Turing/Post/Church, we can all say "Programs are formulae > too!".
Indeed they are. That's why I was sorta laughing at myself with my first "don't wanna be a mathematician, just wanna be a programmer!" snark.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Fri, 17 Sep 2010, Ben Okopnik wrote:
> On Fri, Sep 17, 2010 at 09:46:30PM +0530, Kapil Hari Paranjape wrote: > > The idea is as follows, suppose we had the simpler table > > > > epoch e0 e1 e2 ... ei ... en > > value 0 0 0 ... 1 ... 0 > > > > (i.e. all values are 0 except the i-th value). Then the formula we > > want would give Qi(e). This one is easy to write > > > > prod(j=0,n,if(j!=i,e-ej,1)) > > Qi(e) = --------------------------- > > prod(j=0,n,if(j!=i,ei-ej,1)) > > [laugh] Kapil, I'm fascinated by what you consider "easy". OK, so what > would it be if the values weren't zeros? Does the formula change? > Also, I'm afraid I don't understand the 'prod(j=0,n,if(j!=i,e-ej,1))' > notation: is that a product of... something?
Let's expand the above notation a little (prod was short for product!):
P[i](e) = { prod=1; for(j=0,n;j++) { if(j!=i) prod *= (e-e[j]) ; } return(prod); } Q[i](e) = P[i](e)/P[i](e[i])
It is "easy" in the sense that it is obvious that P[i](e[j])=0 if i and j are different, since one of the factors is (e[j]-e[j]) in that case. Similarly, it is obvious that Q[i](e[i])=1 since we are dividing P[i](e[i]) by itself to get this value.
Similarly,
f(e) = sum(i=0,n,vi*Qi(e))
Can be understood as
f(e) = { sum=0; for(i=0,n;i++) { sum += v[i]*Q[i](e); } return(sum); }
Sorry for being a bit "functional programming"-ish. That was the last kind of programming I did in recent times!
Regards,
Kapil. --
Ben Okopnik [ben at linuxgazette.net]
I may have actually solved it. Perhaps you math-knowledgeable folks can check me, assuming you can read the code I've produced (please feel free to ask if anything isn't clear.)
#!/usr/bin/perl -w # Created by Ben Okopnik on Thu Sep 16 21:52:36 EDT 2010 use POSIX qw/strftime mktime/; use strict; $|++; use constant Intervals => 20; my ($prev_time, $prev_val, $tmin, $tmax, $last_val, @curve); open my $data, "weightdata.txt" or die "weightdata.txt: $!\n"; while (<$data>){ chomp; # Ignore any line that doesn't match; capture the date and the weight as $1 and $2 next unless m{^\s*([\d/]+)\s+([\d.]+)}; my($m, $d, $y) = split /\//, $1; # Convert the date to Unix epoch my $time_point = mktime(0, 0, 0, $d, $m - 1, $y - 1900); # Skip calculations for first line - we need a previous value to calculate slope! if (defined $prev_time){ my $slope = ($2 - $prev_val) / ($time_point - $prev_time); # Save a list of the datapoints and their slopes push @curve, [ $prev_time, $time_point, $prev_val, $slope ]; } ($prev_time, $prev_val) = ($time_point, $2); ($tmax, $last_val) = ($time_point, $2); } close $data; $tmin = $curve[0][0]; # The relationship between intervals and fenceposts is f = i + 1 my $interval = ($tmax - $tmin) / (Intervals - 1); my ($prev_t, $val, $slope); # Go to one less than the last post, and do that one as a special case for my $t (0 .. Intervals - 2){ # Convert count to time value my $calc_t = $t * $interval + $tmin; # Drop the first list item when our time value isn't between the two points shift @curve unless $calc_t >= $curve[0][0] && $calc_t <= $curve[0][1]; # Copy the values ($prev_t, $val, $slope) = @{$curve[0]}[0, 2, 3]; # Recalculate the date from the epoch; calculate the weight printf "%2d: Weight on %s was %.1f\n", $t + 1, strftime("%x",localtime($calc_t)), ($calc_t - $prev_t) * $slope + $val; } # Special case: use the max time and the last value printf "%2d: Weight on %s was %.1f\n", Intervals, strftime("%x",localtime($tmax)), $last_val;
All the date intervals look right; so do the weight changes per interval. I've tried this with a different number of intervals, too, and it seems to work just fine.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 12:58:32PM -0400, Benjamin Okopnik wrote:
> I may have actually solved it. Perhaps you math-knowledgeable folks > can check me, assuming you can read the code I've produced (please feel > free to ask if anything isn't clear.)
Oh, and gluing in a little magic (via the GD::Graph::bars module) gives me this output:
http://okopnik.com/images/weight.png
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at okopnik.com]
Hi, David -
[cc'd back to the list - I suspect that's where you meant to post it, anyway. ]
On Fri, Sep 17, 2010 at 04:33:21PM -0400, David Richardson wrote:
> On 09/17/2010 09:14 AM, Ben Okopnik wrote: > > > >I've been losing a lot of weight lately, and wanted to plot it on a > >chart. However, I've only been keeping very sparse records of the > >change, so what I need to do is interpolate it. In other words, given a > >list like this: > > > >`` > >6/26/2010 334 > >8/12/2010 311.8 > >8/19/2010 308.4 > >9/5/2010 300.0 > >9/9/2010 298.6 > >9/14/2010 297.2 > >9/16/2010 293.6 > >'' > > > >[snip] > Ben: > > The standard linear regression, which has been mentioned several > times in this discussion, is a way of finding a "best fit" line - > one that passes close to all of the points, but may pass through > none of them.
Ah. I'd heard the term many times, of course, but never knew exactly what it meant. Thank you.
> The assumption in using this technique is that the > function (bodily function, in this case) that produced the points is > somewhat linear. A line taken from my weight over the last few > years would be roughly flat, but that line probably would not pass > through too many points - too many ups and downs.
It is flat, more or less. Given the rate at which this is happening (and the way my own mental/emotional apparatus works), it can't have much wobble in it: the only thing that works for me is total commitment (an occasional bump is OK, but anything more than that, and I'd lose focus.) There's no way for the line to get much steeper outside of surgery, so the only flex is up - and I'm not letting it go there.
> If you have by some inhuman dedication managed to steadily lose > weight, than the standard linear regression might work just fine. > If you are like me, a camel hasn't enough humps...
Heh. Well, it's not inhuman dedication - hopefully, I'm still human - but simply the best that I can do. I've decided to get healthy again, even if it kills me.
I'm really looking forward to seeing that chart 6 months down the road.
-- OKOPNIK CONSULTING Custom Computing Solutions For Your Business Expert-led Training | Dynamic, vital websites | Custom programming 443-250-7895 http://okopnik.com
Ben Okopnik [ben at linuxgazette.net]
On Fri, Sep 17, 2010 at 10:19:42PM +0530, Kapil Hari Paranjape wrote:
> > P[i](e) = { prod=1; > for(j=0,n;j++) { > if(j!=i) prod *= (e-e[j]) ; > } > return(prod); > } > Q[i](e) = P[i](e)/P[i](e[i])
[...]
> Similarly, > > f(e) = sum(i=0,n,vi*Qi(e)) > > Can be understood as > > f(e) = { sum=0; > for(i=0,n;i++) { > sum += v[i]*Q[i](e); > } > return(sum); > }
I guess I'm not doing too well in the brain department today; tried to code these up several times, and never managed it. Oh well.
I did manage to make it work in my own program - I think I actually understand the principle of linear regression now, which is the main thing I was looking for (the Wikipedia page just made my head hurt. Again, any suggestion or corrections are welcome. Hopefully, the comments explain it all; it may help to realize that $a[0] is the first element in the array and $a[-1] is the last one.
#!/usr/bin/perl -w # Created by Ben Okopnik on Fri Sep 18 16:57:28 EDT 2010 # Linear Regression for Dummies use strict; use constant Points => 12; # Must be >= 2 # Arbitrary data my @d; my @t = qw/100 110 120 130 140 150/;my @v = qw/300 290 280 270 260 250/; # Each entry in the list consists of t, t', v, and slope push @d, [ $t[$_], $t[$_+1], $v[$_], ($v[$_+1] - $v[$_]) / ($t[$_+1]-$t[$_]) ] for 0..$#t-1; sub calc { # Dump current dataset and go to the next one unless $t' > ARG >= $t shift @d unless $_[0] >= $d[0][0] && $_[0] < $d[0][1]; # Return ((ARG - t) * slope + v) and difference from Vmax my $v = ($_[0] - $d[0][0]) * $d[0][3] + $d[0][2]; [$v, $v - $v[0]]; } # Split into intervals and feed the times to calc() for (my $t = $t[0]; $t <= $t[-1] + 1; $t += ($t[-1] - $t[0]) / (Points - 1)){ printf "At %s, the value is %.1f (change: %.1f)\n", $t, @{calc($t)}; }
> Sorry for being a bit "functional programming"-ish. That was the last > kind of programming I did in recent times!
Oh, that's not a problem. I'm fine with imperative, functional, and objective programming; it's the logical stuff (e.g., recursive-descent parsers) that gives me the willies - I can never quite figure out if I've handled all the rules or not.
Thanks to everyone who helped!
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Francis Daly [francis at daoine.org]
On Fri, Sep 17, 2010 at 09:14:23AM -0400, Ben Okopnik wrote:
Ahoy there!
By now, the question has been answered, the coding has been done, and the pictures have been generated. So it should be ok to start wandering off on tangents to see what else could have been done, or could have been wanted. I hope
> I've been losing a lot of weight lately, and wanted to plot it on a > chart.
Pavlovian response of "gnuplot" kicks in here.
""" set xdata time set timefmt "%m/%d/%Y" plot "weightdata.txt" using 1:2 """
The Fine Manual will show all about labels and ticks and ranges (and "with linespoints") if something prettier is wanted.
> However, I've only been keeping very sparse records of the > change, so what I need to do is interpolate it.
Plotting the real records isn't appropriate, so you want to generate synthetic records "as if" you had taken a real record every ten days. You've done "join the dots with lines, then find the value for each day you had intended to measure", which is sensible and the values won't change over time.
Others have mentioned linear regression and best fit lines. That means, instead of "join the dots", it is "draw the line through the dots that comes closest to most of them", for suitable measures of "closest" and "most". And then you find the value for each day you had intended to measure. In this case, though, when new records are added, the best-fit line will almost certainly change, which means that the older synthetic values will change as new real values are. (This assumes a best-fit straight line, of course.)
As time goes by, and you maintain your target weight, the line will become closer to horizontal. Which is good.
And it's just the machine doing all of the extra sums, so that's ok too.
To do that...
> In other words, given a > list like this: > > `` > 6/26/2010 334 > 8/12/2010 311.8 > 8/19/2010 308.4 > 9/5/2010 300.0 > 9/9/2010 298.6 > 9/14/2010 297.2 > 9/16/2010 293.6 > '' > > I need to come up with a "slope" function that will return my weight at > any point between 6/26 and 9/16.
...gnuplot has a "fit" function.
You tell it the form of function you expect -- in this case a straight line (y=a+bx) -- and just let it do its thing.
""" set xdata time set timefmt "%m/%d/%Y" weight(x)=a+b*x fit weight(x) "weightdata.txt" using 1:2 via a,b plot "weightdata.txt" using 1:2, weight(x) """
...and it all goes horribly wrong.
The line doesn't really look like a "best fit". And the output of the "fit" line (on my test machine, at least) shows
Final set of parameters Asymptotic Standard Error ======================= ==========================
a = 1 +/- 966.8 (9.668e+004%) b = 9.08455e-007 +/- 2.878e-006 (316.8%)
Those error margins are a touch larger than they should be.
The reason is...
the "weight" value goes from about 300 to about 200; the "date" value is "seconds since an epoch". The "fit" function doesn't work well when things vary by orders of magnitude.
So, we can give it a helping hand. This might feel like the famous "fudge factor", but it does have some justification...
First: weights are taken on days. So use "day" rather than "second" as the scaling factor. Five orders of magnitude gone, hurray.
Second: "epoch" defaults to a long time ago, so use a closer one. The records go from June 2010 to currently September 2010, but maybe eventually June 2011? Start time at January 2010 and the "day" count goes from about 180 to now about 270, and eventually about 600. Much nicer in terms of range and variation.
So, let's try again from scratch:
""" set xdata time set timefmt "%m/%d/%Y" jan1=strptime("%m/%d/%Y", "1/1/2010")/86400 weight(x)=a+b*(x/86400-jan1) fit weight(x) "weightdata.txt" using 1:2 via a,b plot "weightdata.txt" using 1:2, weight(x) """
That new line looks much more like a "best fit" line. And the end of the "fit" output includes
Final set of parameters Asymptotic Standard Error ======================= ==========================
a = 418.034 +/- 2.852 (0.6823%) b = -0.476927 +/- 0.01209 (2.535%)
And those error percentages look much healthier.
"b" there is the slope, in "lbs/day" in this measure.
> What I'd really, really like to > have is a function that takes the above list and returns the weight > value for any given point in time between Tmin and Tmax;
One addition to the gnuplot script to allow consistent input:
""" w(day)=weight(strptime("%m/%d/%Y",day)) """
And then you can see what it says:
""" print w("8/28/2010") """
which shows me "304.0", different from the 303.6 already calculated, because the calculation methods are not equivalent.
Of course, there's no need to limit yourself to Tmax in this function. Roll on (or not) April 5th!
And finally, especially given the day that's in it, I expect an alternative solution will be presented using R.
Shiver me timbers, etc.
f
-- Francis Daly francis at daoine.org
Ben Okopnik [ben at linuxgazette.net]
Hey, Francis!
On Sun, Sep 19, 2010 at 12:41:31AM +0100, Francis Daly wrote:
> On Fri, Sep 17, 2010 at 09:14:23AM -0400, Ben Okopnik wrote: > > Ahoy there! > > By now, the question has been answered, the coding has been done, and > the pictures have been generated.
Oh-oh. That sounds like "...and the flowers have been put on the grave". I hope not.
> So it should be ok to start wandering > off on tangents to see what else could have been done, or could have > been wanted. I hope
Yes!!!
(I'd have used more exclamation points, but you know what Terry Pratchett says about them, right?)
> > I've been losing a lot of weight lately, and wanted to plot it on a > > chart. > > Pavlovian response of "gnuplot" kicks in here.
Please don't salivate on gnuplot; that would make it sticky and you might short out the wiring. I happen to like the little gadget, myself; wrote an article for LG on it some years back, in fact.
> """ > set xdata time > set timefmt "%m/%d/%Y" > plot "weightdata.txt" using 1:2 > """
The problem with that, of course, is that it takes me right back to plotting the original datapoints. I wanted to interpolate them to some reasonable intervals, instead. I suppose I could do this, though:
set xdata time set timefmt "%m/%d/%Y" set style fill solid 0.75 border plot 'weightdata.txt' using 1:2 smooth bezier with boxes lt 1, 'weightdata.txt' using 1334-$2) smooth bezier with boxes
That looks like a happy, reasonably fine thing.
> Others have mentioned linear regression and best fit lines. That means, > instead of "join the dots", it is "draw the line through the dots that > comes closest to most of them", for suitable measures of "closest" and > "most".
Ah. Maybe what I did wasn't linear regression after all, then. Well, under whatever name, it does what I want - which is very similar to the above.
> As time goes by, and you maintain your target weight, the line will > become closer to horizontal. Which is good.
That's the hope! But that day is a ways off, yet. I've got a long way to go.
> > I need to come up with a "slope" function that will return my weight at > > any point between 6/26 and 9/16. > > ...gnuplot has a "fit" function. > > You tell it the form of function you expect -- in this case a straight > line (y=a+bx) -- and just let it do its thing.
Well, that is more like linear regression. A best-fit straight line. Heck, if that's what I wanted, given how relatively smooth it is, I could have just plotted a line from Tmin/Wmax to Tmax/Wmin...
> Of course, there's no need to limit yourself to Tmax in this > function. Roll on (or not) April 5th!
Unless something really world-shaking happens, this is going to continue. Given simple calorie needs vs. body weight basics, I can't imagine that the rate will remain the same throughout, but I'm going to be pushing it to remain as steep as possible.
Just for the sake of curiosity, I fasted for the two days preceding the last measurement (that's why that curve is steeper at the end); barring heavy exercise, that's about my max rate of loss right now.
> And finally, especially given the day that's in it, I expect an > alternative solution will be presented using R.
[groan]
> Shiver me timbers, etc.
Holy moly - the penny just dropped. I just realized that "Talk Like a Pirate day" is tomorrow! You're obviously getting ready early, though.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *