More

dracodoc · on March 19, 2021

There really should be a better way to map values. There are numerous functions in R for this, I believe there should be similar functions in Julia. Is that because Julia have good performance so people tend to write loop to deal with this kind of task? And call this hacking?

I tried paste the code but it was hard to read. You can search Histograms in this page https://datasciencejuliahackers.com/03_probability_intro.jl....

chongli · on March 19, 2021

For most cases, Julia’s broadcast syntax [1] is the easiest way to map a function over some values. When you want to do something more complex, Julia provides an incredibly flexible and powerful iterator in CartesianIndices [2].

Did you have a more specific question about mapping or iteration in Julia?

[1] https://julia.guide/broadcasting

[2] https://julialang.org/blog/2016/02/iteration/

tfehring · on March 19, 2021

Using a loop and `if` statements as the book does is unidiomatic, the `replace` function (and the in-place equivalent `replace!`) work as you’d expect. https://docs.julialang.org/en/v1/base/collections/#Base.repl...

If you have a mapping dict defined I think the syntax is just `replace!.(rainData.month, monthMap...)`, not tested though.

ced · on March 19, 2021

What do you mean, "map values"? There is

    [sin(x) for x in vec]
    map(sin, vec)
    sin.(vec)

which are roughly equivalent? Is that what you're referring to?

snicker7 · on March 20, 2021

Don't forget the "lazy" version: (sin(x) for x in vec)

wenc · on March 19, 2021

If you're referring to the `for` loop replacing Spanish month names with English ones, that could have been done like this [1]:

    rainData.month = map(mth -> myEsEnDict[mth], rainData.month)

[1] https://syl1.gitbook.io/julia-language-a-concise-tutorial/us...

montalbano · on March 19, 2021

https://docs.julialang.org/en/v1/base/collections/#Base.map

https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting

cbkeller · on March 19, 2021

Can you give a code example in R of what you would like to do in Julia?

There are quite a lot of different ways of mapping in Julia, including functions like `map`, `mapreduce`, `replace`, and various syntaxes for broadcasting and array comprehensions.

For maximum SIMD performance there are also things like `vmap` and `vmapreduce` from LoopVectorization.jl.

sidpatil · on March 20, 2021

I see what you're saying, regarding the histogram code. A cleaner approach would have been to store the translations in a named tuple or Dict and iterated through it.

One of the other commenters also mentioned the `replace` function, which could be used in place of a loop.

dracodoc · on Jan 7, 2021

Title aside, the purposed solution just

- use Microsoft MRAN which did the heavy lifting of hosting archives

- use date instead of version

- install package automatically in first time (which pacman::p_load has been doing for ages) and easier to use in script level.

It's not coincidence that most package manager solutions used version instead of date to control the environment:

- A paper published on 2017 may used a date in 2017.10.01, but there is a high possibility that some of the dependency packages might be of earlier date, unless the author update packages every day/week, which is not a good habit anyway because updating too frequently will break things more frequently.

- Then how can you reproduce the environment using a date? The underlying assumption that all packages will be latest till that date simply doesn't hold.

That's why packrat/renv etc will use a lock file to record all package versions, and why you will need a project to manage libraries, because you will need to maintain different library environments and cannot install to same location.

Yet the author take installing all packages to a single location as a feature since you don't need to install same package again, and try to avoid project and prefer script as much as possible when doing reproducible research?

dracodoc · on Aug 24, 2020

It's funny. In China there used to be a time when high school students like to use rarely used Chinese characters, which are only similar to the intentional character in shape instead of meaning.

They think that's cool. This style was called brain damaged style. https://zhidao.baidu.com/question/63502990

什庅bai湜焱暒妏？举些瑺鼡哋唎ふ。du 什庅湜悩残軆？举些瑺鼡哋唎ふ。什庅湜zhi悱炷蓅？举些瑺鼡哋唎ふ。莓兲想埝祢巳宬儰⒈种漝惯

Aachen · on Aug 24, 2020

To be clear, "intentional" should be "intended" right? Had to read this a few times (I'm also a non-native speaker, I wonder if that makes it easier or harder for me to understand a text with meaning-altering typos/mistakes) but I think it must be this.

checkyoursudo · on Aug 25, 2020

Intentional and intended are both adjectives. They both mean intended. They are synonyms, so they can be used mostly interchangeably. However, one or the other may be conceptually more clear in various cases. As a native English speaker, intentional character is understandable to me. I likely would have written intended character, but I don't think that intentional here is clearly incorrect.

aasasd · on Aug 24, 2020

30und5 c1053 70 13375p34k.

dracodoc · on Feb 24, 2020

I program mainly in R and I always use RMarkdown. I write extensively about question definition, notes, reference, exploration, different directions in RMarkdown. In the end if I ever need to have a script version I have some utility function to pick code chunks and output as script.

This serve as a very good documentation and is much better than code comments.

dracodoc · on Nov 29, 2019

With Rmarkdown you have 2 options, both have its own merits

1. save code and output separately, so you can save the output (but not comparing the output versions), or always generate output from scratch when needed (this actually help to ensure reproducibility)

2. or you can use R notebook format, which save the result with document together, in some companion folders.

dracodoc · on July 5, 2019

One reason dplyr was promoted is that you can connect to different server backend like sparkr, sql etc, which is the enterprise direction RStudio aiming at. In the other hand, data.table is your friend when you are using your own machine. With more and more memory and cpu power available, its benefits are actually increasing.

I have been using data.table from almost day one, it does worth more recognition. The syntax is a little bit hard in the beginning but can be grasped after some efforts. I often feel that my exploration code will be much more tedious look if written in dplyr style.

vasili111 · on July 5, 2019

If my university uses Enterprise RStutio on grid it is better to stick with tidyverse rather than data.table?

Tarq0n · on July 6, 2019

It doesn't make a difference. Rstudio's products aren't integrated with their libraries in particular.

dracodoc · on Dec 11, 2018

Why are you keep using "Trend micro"? Did you realize that's a totally different company?

cannedslime · on Dec 11, 2018

Oh, my bad... Doesn't really change much though.

dracodoc · on Nov 26, 2018

How about putting cameras in train?

dracodoc · on Nov 8, 2018

In my opinion, the single most point in job seeking is to get feedback as much as possible, as early as possible.

Consider these bad examples:

sending many resumes and didn't get any response. This is huge blow to morale and there no hint on what should be improved, which went wrong.

- to fix it: get someone in field to review the resume and give suggestions. This is simple but many people didn't do it. And it's totally possible to find somebody you never know before to review your resume.

decide to jump on a wagon (data science, machine learning) and plan years for it, like online degree, many courses, a master or even a Ph.D!

- problem: it took too much time to get some feedback, and the commitment is too big, often not executable in reality. It's possible somebody spent a lot of effort to get in a program, only to find out he/she doesn't like/fit it later.

- and the wagon may be outdated when you finished the program.

- to fix: tiptoe as early as possible, like a side project, some short courses. This of course require some existing experience and skills.

In general,

- meetup is a great way to meet with people, learn about the field.

- doing side project is very useful.

dracodoc · on Nov 6, 2018

AMD's resources is limited and they selected a proper priority

1. CPU first, GPU next. as a break through in CPU side is easier than GPU side - just go with more cores with chiplet, since intel basically stopped innovation, while GPU side will be much tougher.

2. data center first, consumer/gamer second. Vega is not meant to compete with best Nvidia card, but it was designed to handle both data center/ML needs and gaming need, maybe the gaming version is just a space holder. The data center version will bring more profit and buy time for AMD to develop the software ecosystem -- CUDA is the moat of Nvidia, and AMD need time to overcome that.

So 7nm is used on data center version instead of a gaming card, which make perfectly sense for AMD.