P

python-rl

Settings | Report Duplicate

0

I Use This!

Inactive

Commits : Listings

Analyzed about 18 hours ago. based on code collected about 23 hours ago.

Commit Message	Contributor	Files Modified	Lines Added	Lines Removed	Code Location	Date
Apr 19, 2023 — Apr 19, 2024 Showing page 1 of 6 Search / Filter on:
Added implementation of vSGD global learning rate.	Will Dabney	More...				almost 11 years ago
These ann parameters seem to work well. But, theres a lot of variance due to the initial weights of the network.	Will Dabney	More...				almost 11 years ago
Improved spearmint generation script, with some info on how to use it in the file.	Will Dabney	More...				almost 11 years ago
Fixed bug where older method name was being called. Thanks Kiril.	Will Dabney	More...				almost 11 years ago
Added reward noise as an argument for some of the domains. I'm testing this out today to see how some RL algorithms do with different types of noise. If all goes well I will push in changes to allow other domains to have reward noise as well.	Will Dabney	More...				almost 11 years ago
cart pole sarsa params	Will Dabney	More...				almost 11 years ago
Added params for acrobot/Sarsa	Will Dabney	More...				almost 11 years ago
A couple more tweaks to vSGD and I've got it working pretty reasonably, but it does depend upon its parameters: C, len of slow start, and initial time-constant.	Will Dabney	More...				almost 11 years ago
Forgot to remove a delted file module from __init__.py for experiments (fixed that). Also worked on vSGD. The method is so theoretically cool, but in practice its just not so great for RL (for general stochastic optimization it works great). Its hard to tell exactly what the deal is on why it underperforms. But, I added a little tweak and extra parameter (which is in the paper set as C = dim_of_features/10, but I found that larger values were needed for decent performance).	Will Dabney	More...				almost 11 years ago
Cleaned everything up and refactored some more. The spearmint related stuff is now separated out into the scripts/ directory. Just running generate_spearmint.sh and passing the output directory location and the experiment json file to use will do all the work. Also got rid of the now redundant randomize_parameters method. Finally, made the agent_get_parameters method into a classmethod called agent_parameters (and made all the agents able to be run standalone in a way that uses the agent_parameters call to generate the argparser).	Will Dabney	More...				almost 11 years ago
AHHHHH Finally Finished this. Well really just got it to a good stopping point, lots more to do to clean things up. But preliminary tests show that the new spearmint.py experiment works correctly. This new experiment requires (for it to be useful) that you specify an ouput directory: --output=somedir/. Given that, it will create a subdir for the agent, produce a protobuf config file for spearmint, and create a wrapper script for spearmint to use.	Will Dabney	More...				almost 11 years ago
Refactored parameter optimization/randomization again. Hopefully this version sticks. Each agent class provides its parameter specification through the agent_get_parameters method, which uses argparse as a convienent data structure for an algorithm's parameters. I wrote a custom container and some helper functions to be used with this setup. This way the randomized trial experiment simply gets the param spec, and then generates random values as needed based upon the spec. I'm making this switch over because I'd like to get to the point of using spearmint (or something like it) to do more intelligent parameter optimization, and to do that I needed a system more like this in which the parameter spec can be queried.	Will Dabney	More...				almost 11 years ago
Cart pole with adaptive step-size config file.	Will Dabney	More...				almost 11 years ago
Minor change to inv. max. eig stepsize method.	Will Dabney	More...				almost 11 years ago
Putting random generator as the default of a named variable in a method call does not do what you'd hope it would do. All my uses of randParameter method that relied upon the default behavior would always get the same exact random numbers. This should fix that bug.	Will Dabney	More...				almost 11 years ago
Working parameters for double cart pole.	Will Dabney	More...				almost 11 years ago
Edit plotExperiment to cause it to not plot incomplete runs	Will Dabney	More...				almost 11 years ago
Getting ready to try to recreate NAC Tetris results from kakade's paper (making nac-lstd as close to his natural policy gradient as possible).	Will Dabney	More...				almost 11 years ago
More tweaks to nac-lstd	Will Dabney	More...				almost 11 years ago
Modified the way compatible features method is called to make it more easily generalized to different forms of compatible features. Tested and hasn't broken anything.	Will Dabney	More...				almost 11 years ago
initial implementation of a self balancing robot task. Two wheeled inverted pendulum (twip). Don't use it for anything serious just yet, still needs some checking to be sure things are working right. But can be fun to play with in the mean time. Allows navigation as well as balancing, so some interesting new domains could open up by using this.	Will Dabney	More...				almost 11 years ago
Cleaned up that divergence idea a little. Any agent class inheriting from skeleton_agent (which they all do at this point) need only to implement has_diverged(self) which returns true/false on that question. Agents are free to check this however they want.	Will Dabney	More...				almost 11 years ago
Finally have some reasonable parameters for Sarsa ANN (they seem to work most of the time, but occassionaly the initial weights are bad and it just doesn't ever find the goal. Also added a message between the experiment and the agent, querying if the agent has diverged. This allows us to handle that event more gracefully than just exiting, but while not wasting CPU cycles.	Will Dabney	More...				almost 11 years ago
Boom! That was easier than expected. plotExperiment can now directly take the output from a randomized trial. Next up is plotParameters.	Will Dabney	More...				almost 11 years ago
Reworked how results are reported in randomized trials (and thus the way randomize_parameters works). I've switched it all over to returning json instead of comma separated values. It would be really easy from this point to allow an option to output csv (as an option), but for now I'll leave it out. The reasoning behind this is that this way when running large experiments you really don't need to keep track of exactly what parameters were in which column for what algorithm, etc. Its all right there in the json. This results in a bit of extra text being stored, which if you do a large (1million) parameter search adds up (I estimated it at about 460mb for 1mil. parameters/results). But, this can easily be side stepped by gzipping the resulting json files (which compress very nicely).	Will Dabney	More...				almost 11 years ago
LSPI modified a little. It seems like no one uses LSPI in an online way like how I'm using it here, which could explain the results I'm getting as well as suggesting that others have gotten the same. It seems to work fine if you collect lots and lots of samples, and then update once. But doing this alternating data collection and policy improvement doesn't seem to be working very well. I'm leaving it for now, but will be on the lookout for solutions.	Will Dabney	More...				almost 11 years ago
Implemented LSPI/LSTDQ, and added some semi-working parameters for them for mountain car. LSTD-Q (the backbone of LSPI) isn't working when I have it update more frequently than it fully refreshes its data samples. This seems wrong, and so I suspect I've got a bug somewhere. I'll be rereading the paper looking for what I might be missing later today and hopefully will push out a fix, or at least an understanding.	Will Dabney	More...				almost 11 years ago
Implemented the 5-state chain domain, and using it was able to get delayed q-learning working. So I've also included the parameters for which delayed q-learning does as well as the initial paper in which the chain domain was introduced.	Will Dabney	More...				almost 11 years ago
Added an example of running a randomized trial.	Will Dabney	More...				almost 11 years ago
Added an example of running a randomized trial.	Will Dabney	More...				almost 11 years ago

←
1
2
3
4
5
6
→