문제
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively.
-> A(1) C(2) G(3) T(4)
You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4
P[1] = 5 Q[1] = 5
P[2] = 0 Q[2] = 6
The answers to these M = 3 queries are as follows:
The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
- The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
- The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.
Write a function:
class Solution { public int[] solution(String S, int[] P, int[] Q); }
that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
Result array should be returned as an array of integers.
For example, given the string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4
P[1] = 5 Q[1] = 5
P[2] = 0 Q[2] = 6
the function should return the values [2, 4, 1], as explained above.
Write an efficient algorithm for the following assumptions:
- N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
- each element of arrays P and Q is an integer within the range [0..N - 1];
- P[K] ≤ Q[K], where 0 ≤ K < M;
- string S consists only of upper-case English letters A, C, G, T.
해답
생각보다 좀 헤맸다. 다른 블로그들 참고하면서 이해했는데 개인적인 생각으로는 타입이 ACGT 4개 뿐이라서 2차원 배열 만들어서 저장하는게 부담없이 진행되지 않았나 싶다.
ACGT를 숫자로 바꾸는게 번거로워서 Map에 값을 넣고 변경했다.
key point : 각 층마다 존재하는 알파벳 수를 기입하는 2중 포문 map 사용
문제에 나온 예제로 살펴보면
map[0]=C={0,1,0,0}
map[1]=CA={1,1,0,0}
map[2]=CAG={1,1,1,0}
map[3]=CAGC={1,2,1,0}
map[4]=CAGCC={1,3,1,0}
map[5]=CAGCCT={1,3,1,1}
map[6]=CAGCCTA={2,3,1,1}
이렇게 2차원 배열 map에 저장이 된다.
array 위치 기준으로 2~4 내부에서 찾는다고 하면 GCC = map[4]-map[1]={0,2,1,0} 이 된다.
시작 위치는 2인데 map[1]을 빼는 이유는 map[2]를 빼게되면 2에 위치한 G의 값까지 빼게 된다!
map[4]-map[2]={0,2,0,0} 으로 G가 빠지기 때문에 적절하지 않다.
import java.util.HashMap;
import java.util.Map;
public class PrefixSum {
public int[] solution(String S, int[] P, int[] Q) {
Map<Character, Integer> change = new HashMap<>();
change.put('A', 0);
change.put('C', 1);
change.put('G', 2);
change.put('T', 3);
int N = P.length;
int[][] arr = new int[S.length()][4];
arr[0][change.get(S.charAt(0))]++;
for (int i = 1; i < S.length(); i++) {
char c = S.charAt(i);
arr[i] = arr[i - 1].clone();
arr[i][change.get(c)]++;
}
int[] res = new int[N];
for (int i = 0; i < N; i++) {
for (int j = 0; j < 4; j++) {
int start = (P[i] == 0) ? 0 : arr[P[i]-1][j];
int end = arr[Q[i]][j];
if (end - start > 0) {
res[i] = j + 1;
break;
}
}
}
return res;
}
}
References
https://jackjeong.tistory.com/44
'알고리즘' 카테고리의 다른 글
2021 KAKAO BLIND 코딩테스트 메뉴 리뉴얼 풀이 Java (0) | 2022.07.07 |
---|---|
[분할 정복] 쿼드 트리 뒤집기(convert quad tree) (0) | 2022.03.17 |
Codility : MaxSliceSum 풀이 Java, DP (0) | 2022.03.05 |
Codility : Greedy MaxNonoverlappingSegments 풀이 Java, Greedy 알고리즘 (0) | 2022.03.04 |
Codility : Leader Dominator 풀이 Java, HashMap entrySet (0) | 2022.03.03 |
댓글