There really should be a better way to map values. There are numerous functions in R for this, I believe there should be similar functions in Julia. Is that because Julia have good performance so people tend to write loop to deal with this kind of task? And call this hacking?
For most cases, Julia’s broadcast syntax [1] is the easiest way to map a function over some values. When you want to do something more complex, Julia provides an incredibly flexible and powerful iterator in CartesianIndices [2].
Did you have a more specific question about mapping or iteration in Julia?
Can you give a code example in R of what you would like to do in Julia?
There are quite a lot of different ways of mapping in Julia, including functions like `map`, `mapreduce`, `replace`, and various syntaxes for broadcasting and array comprehensions.
For maximum SIMD performance there are also things like `vmap` and `vmapreduce` from LoopVectorization.jl.
I see what you're saying, regarding the histogram code. A cleaner approach would have been to store the translations in a named tuple or Dict and iterated through it.
One of the other commenters also mentioned the `replace` function, which could be used in place of a loop.
- use Microsoft MRAN which did the heavy lifting of hosting archives
- use date instead of version
- install package automatically in first time (which pacman::p_load has been doing for ages) and easier to use in script level.
It's not coincidence that most package manager solutions used version instead of date to control the environment:
- A paper published on 2017 may used a date in 2017.10.01, but there is a high possibility that some of the dependency packages might be of earlier date, unless the author update packages every day/week, which is not a good habit anyway because updating too frequently will break things more frequently.
- Then how can you reproduce the environment using a date? The underlying assumption that all packages will be latest till that date simply doesn't hold.
That's why packrat/renv etc will use a lock file to record all package versions, and why you will need a project to manage libraries, because you will need to maintain different library environments and cannot install to same location.
Yet the author take installing all packages to a single location as a feature since you don't need to install same package again, and try to avoid project and prefer script as much as possible when doing reproducible research?
It's funny. In China there used to be a time when high school students like to use rarely used Chinese characters, which are only similar to the intentional character in shape instead of meaning.
To be clear, "intentional" should be "intended" right? Had to read this a few times (I'm also a non-native speaker, I wonder if that makes it easier or harder for me to understand a text with meaning-altering typos/mistakes) but I think it must be this.
Intentional and intended are both adjectives. They both mean intended. They are synonyms, so they can be used mostly interchangeably. However, one or the other may be conceptually more clear in various cases. As a native English speaker, intentional character is understandable to me. I likely would have written intended character, but I don't think that intentional here is clearly incorrect.
I program mainly in R and I always use RMarkdown. I write extensively about question definition, notes, reference, exploration, different directions in RMarkdown. In the end if I ever need to have a script version I have some utility function to pick code chunks and output as script.
This serve as a very good documentation and is much better than code comments.
With Rmarkdown you have 2 options, both have its own merits
1. save code and output separately, so you can save the output (but not comparing the output versions), or always generate output from scratch when needed (this actually help to ensure reproducibility)
2. or you can use R notebook format, which save the result with document together, in some companion folders.
One reason dplyr was promoted is that you can connect to different server backend like sparkr, sql etc, which is the enterprise direction RStudio aiming at. In the other hand, data.table is your friend when you are using your own machine. With more and more memory and cpu power available, its benefits are actually increasing.
I have been using data.table from almost day one, it does worth more recognition. The syntax is a little bit hard in the beginning but can be grasped after some efforts. I often feel that my exploration code will be much more tedious look if written in dplyr style.
In my opinion, the single most point in job seeking is to get feedback as much as possible, as early as possible.
Consider these bad examples:
sending many resumes and didn't get any response. This is huge blow to morale and there no hint on what should be improved, which went wrong.
- to fix it: get someone in field to review the resume and give suggestions. This is simple but many people didn't do it. And it's totally possible to find somebody you never know before to review your resume.
decide to jump on a wagon (data science, machine learning) and plan years for it, like online degree, many courses, a master or even a Ph.D!
- problem: it took too much time to get some feedback, and the commitment is too big, often not executable in reality. It's possible somebody spent a lot of effort to get in a program, only to find out he/she doesn't like/fit it later.
- and the wagon may be outdated when you finished the program.
- to fix: tiptoe as early as possible, like a side project, some short courses. This of course require some existing experience and skills.
In general,
- meetup is a great way to meet with people, learn about the field.
AMD's resources is limited and they selected a proper priority
1. CPU first, GPU next. as a break through in CPU side is easier than GPU side - just go with more cores with chiplet, since intel basically stopped innovation, while GPU side will be much tougher.
2. data center first, consumer/gamer second. Vega is not meant to compete with best Nvidia card, but it was designed to handle both data center/ML needs and gaming need, maybe the gaming version is just a space holder. The data center version will bring more profit and buy time for AMD to develop the software ecosystem -- CUDA is the moat of Nvidia, and AMD need time to overcome that.
So 7nm is used on data center version instead of a gaming card, which make perfectly sense for AMD.
I tried paste the code but it was hard to read. You can search Histograms in this page https://datasciencejuliahackers.com/03_probability_intro.jl....